01. Introduction to POSIX Regular Expressions

What are regular expressions?

Regular expressions are a group of text, consisting of characters that are symbolic or literal, which are used to identify patterns of text. Just as an example, a regular expression such as ^hello. would match all characters that start with the text "hello", then a single character. In this series we'll go over how to construct such regular expressions.

What is the POSIX standard?

POSIX, an acronym for Portable Operating System Interface, is a set of standards defined by the IEEE Computer Society for maintaining compatibility between operating systems. The standard was implemented in order to make software portable between variations of Unix and other operating systems.

How did POSIX affect Unix regular expressions?

Regular expressions vary from the languages it's implemented in as well with the tools used. On the command line, there used to be three different commands for regular expressions. grep included all basic regular expressions (BRE), while extended grep egrep included more notations that deemed it more powerful, for the cost of efficiency. The collection of features it includes is known extended regular expressions (ERE). Additionally, there was fast grep fgrep, which allowed for multiple fixed string matching. These variations were merged to grep by POSIX in 1992.

Now, this merging of basic regular expressions (BRE) and extended regular expressions (ERE) did not mean their notations were combined. When using grep, the default is set to BRE notation, but you can easily switch to ERE with the -E option.

Did I confuse you enough yet? Hopefully not... We'll go over which regex notations need the -E option, so no need to be lost!

Regular Expressions with grep

The command line uses the grep command, which stands for global regular expression printing to use regular expressions.

Why learn regex and grep?

The knowledge of grep and how to construct regular expressions is extremely powerful. For example, you may complete text manipulations such as searching and substitutions all in one line. These can be incorporated into a shell scripts to automate work-flow for fast and easy text processing. Regular expressions can also be used in text editors such as Vim and emacs, file viewers such as less and man, along with programming languages such as awk, python, and perl.

A sample test

Let's try out a simple grep command to get started. We won't be using any special characters, just literal values.

$ ls /usr/bin | grep 'zip'
bunzip2 bzip2 bzip2recover funzip gunzip gzip unzip unzipsfx zip zipcloak zipdetails zipdetails5.16 zipdetails5.18 zipgrep zipinfo zipnote zipsplit

Here, we can see all the commands with the word zip in them within our /usr/bin folder. Remember the bin folder is where all our commands are stored in binaries format.

Avoid Parsing ls

This article makes a good claim on why you shouldn't parse the results of ls. In this tutorial, we'll be parsing the contents in our /usr/bin folder, just as examples. However, be sure to give this article a quick read and understand that parsing the results of ls may give unexpected results.

Options with grep

Here is a list of useful options you can use with the grep command.

Print the number of matches.
Use extended regular expressions.
Input a list of patterns. Returns any matches from that list.
Using fixed strings (ignore special characters).
Read patterns from a newline-separated file.
Suppress the output of file-names.
Ignore casing.
Prints the name of files that weren't matched.
List names of files that match the pattern (instead of printing matched lines).
prefix each matching line w the number of the line within the file
Doesn't print anything, but exits quietly.
Search recursively through specified folder.
Suppresses error messages.
Print the lines that didn't match any patterns.

Now that you have an understanding of regular expressions, POSIX standards, and grep, let's learn about the two types of characters in regex.

Take your Linux skills to the next level!

How Linux Works

Take your Linux skills to the next level! Try Linux & UNIX

In this completely revised second edition of the perennial best seller How Linux Works, author Brian Ward makes the concepts behind Linux internals accessible to anyone curious about the inner workings of the operating system. Inside, you'll find the kind of knowledge that normally comes from years of experience doing things the hard way.

$ Check price
39.9539.95Amazon 5 logo(114+ reviews)

More Linux & UNIX resources

Aching back from coding all day?

Acupressure Mat & Pillow

Aching back from coding all day? Try Back Problems

Relieve your stress, back, neck and sciatic pain through 1,782 acupuncture points for immediate neck pain relief. Made for lower, upper and mid chronic back pain treatment, and improves circulation, sleep, digestion and quality of life.

$$ Check price
144.87144.87Amazon 4.5 logo(1,890+ reviews)

More Back Problems resources