01. Introduction to POSIX Regular Expressions

What are regular expressions?

Regular expressions are a group of text, consisting of characters that are symbolic or literal, which are used to identify patterns of text. Just as an example, a regular expression such as ^hello. would match all characters that start with the text "hello", then a single character. In this series we'll go over how to construct such regular expressions.

What is the POSIX standard?

POSIX, an acronym for Portable Operating System Interface, is a set of standards defined by the IEEE Computer Society for maintaining compatibility between operating systems. The standard was implemented in order to make software portable between variations of Unix and other operating systems.

How did POSIX affect Unix regular expressions?

Regular expressions vary from the languages it's implemented in as well with the tools used. On the command line, there used to be three different commands for regular expressions. grep included all basic regular expressions (BRE), while extended grep egrep included more notations that deemed it more powerful, for the cost of efficiency. The collection of features it includes is known extended regular expressions (ERE). Additionally, there was fast grep fgrep, which allowed for multiple fixed string matching. These variations were merged to grep by POSIX in 1992.

Now, this merging of basic regular expressions (BRE) and extended regular expressions (ERE) did not mean their notations were combined. When using grep, the default is set to BRE notation, but you can easily switch to ERE with the -E option.

Did I confuse you enough yet? Hopefully not... We'll go over which regex notations need the -E option, so no need to be lost!

Regular Expressions with grep

The command line uses the grep command, which stands for global regular expression printing to use regular expressions.

Why learn regex and grep?

The knowledge of grep and how to construct regular expressions is extremely powerful. For example, you may complete text manipulations such as searching and substitutions all in one line. These can be incorporated into a shell scripts to automate work-flow for fast and easy text processing. Regular expressions can also be used in text editors such as Vim and emacs, file viewers such as less and man, along with programming languages such as awk, python, and perl.

A sample test

Let's try out a simple grep command to get started. We won't be using any special characters, just literal values.

$ ls /usr/bin | grep 'zip'
bunzip2 bzip2 bzip2recover funzip gunzip gzip unzip unzipsfx zip zipcloak zipdetails zipdetails5.16 zipdetails5.18 zipgrep zipinfo zipnote zipsplit

Here, we can see all the commands with the word zip in them within our /usr/bin folder. Remember the bin folder is where all our commands are stored in binaries format.

Avoid Parsing ls

This article makes a good claim on why you shouldn't parse the results of ls. In this tutorial, we'll be parsing the contents in our /usr/bin folder, just as examples. However, be sure to give this article a quick read and understand that parsing the results of ls may give unexpected results.

Options with grep

Here is a list of useful options you can use with the grep command.

Print the number of matches.
Use extended regular expressions.
Input a list of patterns. Returns any matches from that list.
Using fixed strings (ignore special characters).
Read patterns from a newline-separated file.
Suppress the output of file-names.
Ignore casing.
Prints the name of files that weren't matched.
List names of files that match the pattern (instead of printing matched lines).
prefix each matching line w the number of the line within the file
Doesn't print anything, but exits quietly.
Search recursively through specified folder.
Suppresses error messages.
Print the lines that didn't match any patterns.

Now that you have an understanding of regular expressions, POSIX standards, and grep, let's learn about the two types of characters in regex.

Aching back from coding all day?

Foam Seat Cushion

Aching back from coding all day? Try Back Problems

This foam seat cushion relieves lowerback pain, numbness and pressure sores by promoting healthy weight distribution, posture and spine alignment. Furthermore, it reduces pressure on the tailbone and hip bones while sitting. Perfect for sitting on the computer desk for long periods of time.

$ Check price
99.9599.95Amazon 4.5 logo(9,445+ reviews)

More Back Problems resources

Take your Linux skills to the next level!

Linux for Beginners

Take your Linux skills to the next level! Try Linux & UNIX

Linux for Beginners doesn't make any assumptions about your background or knowledge of Linux. You need no prior knowledge to benefit from this book. You will be guided step by step using a logical and systematic approach. As new concepts, commands, or jargon are encountered they are explained in plain language, making it easy for anyone to understand.

$ Check price
24.9924.99Amazon 4.5 logo(101+ reviews)

More Linux & UNIX resources