04. Bracket Expressions

Brackets allow you to specify a single character from a group . For example, if you wanted any single vowel, you can use [aeiou].

$ ls /usr/bin | grep 'b[aeiou]t'
batch bitesize.d smbutil

Negating Bracket Expressions

To negate all characters within brackets, precede the characters within the brackets with a caret (^)

$ ls /usr/bin | grep 'b[^aeiou]t'
rwbytype.d

This would specify some text pattern that has a single character that is not [aeiou] between b and t.

Simplifying with a range

We can specify a range if we want a range of characters or numbers.

$ ls /usr/bin | grep '[a-d][e-g][h-l]'
afhash
afida
afinfo
cancel
git-receive-pack
ldapdelete
mdfind
snmpdelta

With this command, we selected words that contains a first letter from a, b, c, or d, a second letter from e, f, or g, and a third letter from h, i, j, k, or l. Notice how this sequence of three letters can appear anywhere in the word.

Portability conflicts with range

A severe downside to using the - metacharacter for range is that it's not portable due to different character collation orders. To explain this, we need to learn a bit of history.

Unix was first developed with just ASCII characters. These were the canonical English characters which had order from 0 to 127, including characters such as control codes, printable characters, and upper/lowercase letters with numbers and punctuation marks. For letters, we had an ordering for characters like:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

As other countries began adopting Unix, they had to make room for more characters. They had to include special characters such as an e with an accent over it, or a c with a squiggly line beneath. Thus, some collations arose with an ordering like this:

aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

You could probably imagine the problem already. An expression such as A-Z would capture all uppercase letters in the first example, but all letters except a in the second.

Thus, try not to use the range character too much. You can instead rely on Character Classes (below), which are POSIX standard.

Checking your locale

To check your current locale, print the $LANG variable.

$ echo $LANG
en_US.UTF-8

Character Classes

Because of the discrepancies in collation ordering, Unix provides several character classes in order to make shell scripts more portable. Here is a list of the character classes:

[:alnum:]
Alphanumeric
[:alpha:]
Alphabet
[:blank:]
Space and Tab
[:cntrl:]
Control
[:digit:]
Numeric
[:graph:]
Non-space
[:lower:]
Lowercase
[:print:]
Printable
[:punct:]
Punctuation
[:space:]
Whitespace
[:upper:]
Uppercase
[:xdigit:]
Hexadecimal digits

When using character classes you must place them within brackets.

$ ls /usr/bin/ | grep '[[:digit:]][[:alpha:]][[:digit:]]'
a2p5.16
a2p5.18
s2p5.16
s2p5.18

This matched any files that had a sequence containing a digit, an alphabet character, followed by another digit.

Using metacharacters within brackets

When metacharacters are placed within brackets, they lose their special meaning.

The following code would match any listings with a minus symbol (-), a period (.) or the letter x.

$ ls /usr/bin/ | grep '[-.x]'
... weblatency.d wish8.4 wish8.5 xar xargs xattr xattr-2.6 xattr-2.7 xcode-select ...

If you want to specify the bracket (]) or the minus symbol (-), place them first in the list.

Non-English Environment

In some languages, two letters in sequence may identify itself as a one unit.

For example, if we were to consider the characters 'ts' as one unit, we could do so by placing them in brackets and periods [.ts.].

Furthermore, we can specify characters that have some variations such as an accent mark or tilde. By having the expression [=a=], we can specify all variations of the letter a. This includes à, á, â, and ã.

Aching back from coding all day?

Prism Glasses

Aching back from coding all day? Try Back Problems

Ever feel achy from sitting crunched up on your computer table? Try lying down with these optical glasses that allow you to work on your laptop while lying flat on your back. This is the perfect solution with those with limited mobility or those who wish to prevent neck cramps and back strains.

$ Check price
4.454.45Amazon 4 logo(128+ reviews)

More Back Problems resources

Take your Linux skills to the next level!

Command Line Kung Fu

Take your Linux skills to the next level! Try Linux & UNIX

Command Line Kung Fu is packed with dozens of tips and practical real-world examples. You won't find theoretical examples in this book. The examples demonstrate how to solve actual problems. The tactics are easy to find, too. Each chapter covers a specific topic and groups related tips and examples together.

$ Check price
14.9914.99Amazon 4.5 logo(27+ reviews)

More Linux & UNIX resources

Ad