04. Bracket Expressions

Brackets allow you to specify a single character from a group . For example, if you wanted any single vowel, you can use [aeiou].

$ ls /usr/bin | grep 'b[aeiou]t'
batch bitesize.d smbutil

Negating Bracket Expressions

To negate all characters within brackets, precede the characters within the brackets with a caret (^)

$ ls /usr/bin | grep 'b[^aeiou]t'
rwbytype.d

This would specify some text pattern that has a single character that is not [aeiou] between b and t.

Simplifying with a range

We can specify a range if we want a range of characters or numbers.

$ ls /usr/bin | grep '[a-d][e-g][h-l]'
afhash
afida
afinfo
cancel
git-receive-pack
ldapdelete
mdfind
snmpdelta

With this command, we selected words that contains a first letter from a, b, c, or d, a second letter from e, f, or g, and a third letter from h, i, j, k, or l. Notice how this sequence of three letters can appear anywhere in the word.

Portability conflicts with range

A severe downside to using the - metacharacter for range is that it's not portable due to different character collation orders. To explain this, we need to learn a bit of history.

Unix was first developed with just ASCII characters. These were the canonical English characters which had order from 0 to 127, including characters such as control codes, printable characters, and upper/lowercase letters with numbers and punctuation marks. For letters, we had an ordering for characters like:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

As other countries began adopting Unix, they had to make room for more characters. They had to include special characters such as an e with an accent over it, or a c with a squiggly line beneath. Thus, some collations arose with an ordering like this:

aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

You could probably imagine the problem already. An expression such as A-Z would capture all uppercase letters in the first example, but all letters except a in the second.

Thus, try not to use the range character too much. You can instead rely on Character Classes (below), which are POSIX standard.

Checking your locale

To check your current locale, print the $LANG variable.

$ echo $LANG
en_US.UTF-8

Character Classes

Because of the discrepancies in collation ordering, Unix provides several character classes in order to make shell scripts more portable. Here is a list of the character classes:

[:alnum:]
Alphanumeric
[:alpha:]
Alphabet
[:blank:]
Space and Tab
[:cntrl:]
Control
[:digit:]
Numeric
[:graph:]
Non-space
[:lower:]
Lowercase
[:print:]
Printable
[:punct:]
Punctuation
[:space:]
Whitespace
[:upper:]
Uppercase
[:xdigit:]
Hexadecimal digits

When using character classes you must place them within brackets.

$ ls /usr/bin/ | grep '[[:digit:]][[:alpha:]][[:digit:]]'
a2p5.16
a2p5.18
s2p5.16
s2p5.18

This matched any files that had a sequence containing a digit, an alphabet character, followed by another digit.

Using metacharacters within brackets

When metacharacters are placed within brackets, they lose their special meaning.

The following code would match any listings with a minus symbol (-), a period (.) or the letter x.

$ ls /usr/bin/ | grep '[-.x]'
... weblatency.d wish8.4 wish8.5 xar xargs xattr xattr-2.6 xattr-2.7 xcode-select ...

If you want to specify the bracket (]) or the minus symbol (-), place them first in the list.

Non-English Environment

In some languages, two letters in sequence may identify itself as a one unit.

For example, if we were to consider the characters 'ts' as one unit, we could do so by placing them in brackets and periods [.ts.].

Furthermore, we can specify characters that have some variations such as an accent mark or tilde. By having the expression [=a=], we can specify all variations of the letter a. This includes à, á, â, and ã.

Aching back from coding all day?

Self-Massage Tool

Aching back from coding all day? Try Back Problems

Relieve spasms, tight muscles, trigger points and pressure points with the Body Back Buddy! This trigger point massage is designed to help you self-message any area of your body - especially those that are hard to reach. Keeping your muscles relaxes and out of contraction is importan in helping to reduce pain and prevent muscle injury.

$ Check price
29.9529.95Amazon 4.5 logo(3,443+ reviews)

More Back Problems resources

Take your Linux skills to the next level!

System Admin Handbook

Take your Linux skills to the next level! Try Linux & UNIX

This book approaches system administration in a practical way and is an invaluable reference for both new administrators and experienced professionals. It details best practices for every facet of system administration, including storage management, network design and administration, email, web hosting, scripting, and much more.

$ Check price
74.9974.99Amazon 4.5 logo(142+ reviews)

More Linux & UNIX resources

Ad