The backbone of AWK's programming model consists of two pieces: 1) records & fields, along with 2) patterns & actions. Let's look at the first core component here, then move onto patterns & actions in the next lesson.
AWK views each input stream as a collection of records. Records can be thought of individual lines, which are then divided into fields (each data cell). Take a look at the figure below, which displays the grades.txt file.
In the above example, the records are separated by a newline character, and the fields are delimited by whitespace. But perhaps we are working with CSV (comma-separated values) files, which use commas to separate fields. Or maybe we have just one long line of data, with each record separated by a semicolon (;
). How do we let our gawk
implementation to know this?
To specify the character that separates records, we use the built-in RS
variable. In the original AWK implementation, the RS
variable had to be a single literal character such as the newline or an empty string. In other implementations such as gawk
, RS
may be a regular expression.
In the case we have a regular expression, RS
will hold the literal regex, while RT
will hold the matching string.
$ echo firstRecord 111111 secondRecord 222222 thirdRecord 333333 lastRecord |
> gawk 'BEGIN { RS = "([[:digit:]]+)" }
> { print "RS = " RS " and RT = " RT }'
RS = ([[:digit:]]+) and RT = 111111
RS = ([[:digit:]]+) and RT = 222222
RS = ([[:digit:]]+) and RT = 333333
This code snippet sets the RS
variable to any number of digits. Notice how the RS
variable displays the literal regex, while RT
displays the matched regex.
The Output Record Separator (ORS) is used to specify what should come after an record is printed. The default is a newline character.
In this example, we read and print out the current record in our buffer (denoted by $0
), followed by a plus (+
) symbol.
$ echo 'hello; nihao; hola; anyonghasaeyo' |
> gawk 'BEGIN { RS = ";"; ORS = " +"}
> { print $0 }'
hello + nihao + hola + anyonghasaeyo
Fields are separated by the FS
variable. The default value is a single space, which translates to one or more whitespace characters with the leading/trailing whitespaces on the line are ignored. Thus, the following fields looks the same to AWK.
Joe John Johanna
Joe John Johanna
To specify a literal single space, enclose the space with brackets such that FS = '[ ]'
The field separated may be identified by the -F
option via the command line, or by assigning it in the BEGIN
block.
$ echo 'Joe John Johanna' |
> gawk -F' ' '{ print NF ":" $0 }'
3:Joe John Johanna
# Same command as above but using the BEGIN block
$ echo 'Joe John Johanna' |
> gawk 'BEGIN { FS=" " }
> { print NF ":" $0 }'
3:Joe John Johanna
# Changing the FS character
$ echo ' Joe John Johanna ' |
> gawk -F'[ ]' '{ print NF ":" $0 }'
13: Joe John Johanna
Here we can see that the -F
variable is used to manipulate the FS
variable straight from the command line. We'll formally learn about how to use AWK via the command line in future lesson.
The Output Field Separator, or OFS
stores the variable that separates each field upon output. By default, it is a space.
$ echo 'John Mary; Jacob Teresa; Bob Claire' |
> gawk 'BEGIN { OFS=" loves "; RS=";" }
> { print $1, $2 }'
John loves Mary
Jacob loves Teresa
Bob loves Claire
h3 Field accession ($n)
You may have noticed the use of the $0
variable in the previous example. This variable stores the current record. To access fields, we can simply use a $
, followed by the field number (eg. $1
for the first field, $2
for the second, and so on).
$ echo 'uno dos tres' | gawk -F' ' '{ print "The second | field is: " $2; print "The entire record is: " $0 }'
The second field is: dos
The entire record is: uno dos tres
Note that that the values start at 1 and not 0, unlike most programming languages with a zero-based index.
Fields are converted to integer values accordingly. Thus, $(2*2)
, $(8/2)
, $"4.41"
and $4
all refer to the fourth field. Note that negative values have no meaning.
Relieve spasms, tight muscles, trigger points and pressure points with the Body Back Buddy! This trigger point massage is designed to help you self-message any area of your body - especially those that are hard to reach. Keeping your muscles relaxes and out of contraction is importan in helping to reduce pain and prevent muscle injury.
$ Check priceThe Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell. Along the way you'll learn the timeless skills handed down by generations of gray-bearded, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching with regular expressions, and more.
$ Check priceAd