04. Cutting, Pasting and Joining cut, paste, join

Let's look at how we can use the cut, paste and join operations to edit and format text files.

Cut

Much like the "cutting" most of you are familiar with, the cut command in UNIX takes the section from a file and outputs it to standard out. However, it does not delete any part of the file it extracts text from. cut is powerful in that it may accept multiple files for its standard input.

With the options listed, there are several ways you can specify a cut.

-b
Select only the bytes specified. May be a single, set or range of bytes, separated by a comma.
-c
Specify the number of characters from each line.
-f
Extract a set of specified fields.
-d
Used with the -f option. Use a specified delimiter rather than default tab.

You may only use one of the -b, -c or -f options. Each of these options come with a list that is made up of an integer, range of integers, or multiple integer ranges separated by a comma. A list is defined as follows:

n
The nth byte, character or field. Count starts at 1.
n-
From the nth byte, character or field forward.
n-m
From the nth to the mth byte, character or field (inclusive).
-m
From the first to the mth byte.

$ cat test.txt
doh  re  me  fa  so
1 2 3 4 5
$ cut -f 3-5 test.txt
me  fa  so
3 4 5
$ cut -f 1,3-4 test.txt
doh  me  fa
1 3 4

Paste

The paste command is used to merge lines of files together. With this command, you can add one or more columns (or fields) of text to a file. There are two options you should be aware of:

-d
Specify the delimiter to be used instead of tabs.
-s
Append in serial instead of parallel. (Horizontal pasting instead of vertical.)
$ cat names.txt
Billy
Bob
Chase
Jon
Jonathan
$ cat birthdates.txt
09/21/1992
08/12/1982
05/24/1999
04/23/1974
08/09/2001
$ paste -d ',' names.txt birthdates.txt
Billy,09/21/1992
Bob,08/12/1982
Chase,05/24/1999
Jon,04/23/1974
Jonathan,08/09/2001

If there are an unequal number of fields (5 rows in names.txt and only 3 in birthdates.txt), the bottom two names will not be matched with anything.

$ paste -d ',' 5names.txt 3birthdates.txt
Billy,05/24/1999
Bob,04/23/1974
Chase,08/09/2001
Jon,
Jonathan,

Join

If you're familiar with a relational databases (and don't worry if you're not), the join command should sound very familiar. In short, join takes a common column between two tables, and joins them together based on that attribute.

-t
Specify a delimiter
-1 n
Use the nth column as the join key for the first column.
-2 n
Use the nth column as the join key for the second column.
-a n
Also print the unprintable lines from n, where n is 1 or 2 (first or second file).

Basic joining

Let's try a join operation as an example.

$ cat birthdates.txt
05/24/1999,4
04/23/1974,2
08/09/2001,5
11/24/1991,3
01/23/1975,1
$ cat names.txt
Billy,1
Bob,2
Chase,3
Jon,4
Jonathan,5

First, we must have both lists sorted according to the column we want to join on. names.txt is already sorted, but birthdates.txt is not. Refer to the sorting page to learn how the sort operation works.

$ sort -t ',' -k 2 birthdates.txt > sortedBirthdates.txt
$ join -t ',' -1 2 -2 2 names.txt sortedBirthdates.txt
1,Billy,01/23/1975
2,Bob,04/23/1974
3,Chase,11/24/1991
4,Jon,05/24/1999
5,Jonathan,08/09/2001

Right/Left outer join

In some cases, you'll want to join two tables even though there are some rows without a corresponding value in the other row. Joining the right table with missing corresponding values is called a right outer join, and joining the left table with missing corresponding rows is called a left outer join.

A left outer join would have the option -a1 and a right outer join would have option -a2.

Full outer join

In a full outer join, both table rows are included, even if they don't have a corresponding row. Expect many null cell values when using this option.

Use the -a option for a full outer join.

Aching back from coding all day?

Acupressure Mat & Pillow

Aching back from coding all day? Try Back Problems

Relieve your stress, back, neck and sciatic pain through 1,782 acupuncture points for immediate neck pain relief. Made for lower, upper and mid chronic back pain treatment, and improves circulation, sleep, digestion and quality of life.

$$ Check price
144.87144.87Amazon 4.5 logo(1,890+ reviews)

More Back Problems resources

Take your Linux skills to the next level!

How Linux Works

Take your Linux skills to the next level! Try Linux & UNIX

In this completely revised second edition of the perennial best seller How Linux Works, author Brian Ward makes the concepts behind Linux internals accessible to anyone curious about the inner workings of the operating system. Inside, you'll find the kind of knowledge that normally comes from years of experience doing things the hard way.

$ Check price
39.9539.95Amazon 5 logo(114+ reviews)

More Linux & UNIX resources

Ad