04. Cutting, Pasting and Joining cut, paste, join

Let's look at how we can use the cut, paste and join operations to edit and format text files.

Cut

Much like the "cutting" most of you are familiar with, the cut command in UNIX takes the section from a file and outputs it to standard out. However, it does not delete any part of the file it extracts text from. cut is powerful in that it may accept multiple files for its standard input.

With the options listed, there are several ways you can specify a cut.

-b
Select only the bytes specified. May be a single, set or range of bytes, separated by a comma.
-c
Specify the number of characters from each line.
-f
Extract a set of specified fields.
-d
Used with the -f option. Use a specified delimiter rather than default tab.

You may only use one of the -b, -c or -f options. Each of these options come with a list that is made up of an integer, range of integers, or multiple integer ranges separated by a comma. A list is defined as follows:

n
The nth byte, character or field. Count starts at 1.
n-
From the nth byte, character or field forward.
n-m
From the nth to the mth byte, character or field (inclusive).
-m
From the first to the mth byte.

$ cat test.txt
doh  re  me  fa  so
1 2 3 4 5
$ cut -f 3-5 test.txt
me  fa  so
3 4 5
$ cut -f 1,3-4 test.txt
doh  me  fa
1 3 4

Paste

The paste command is used to merge lines of files together. With this command, you can add one or more columns (or fields) of text to a file. There are two options you should be aware of:

-d
Specify the delimiter to be used instead of tabs.
-s
Append in serial instead of parallel. (Horizontal pasting instead of vertical.)
$ cat names.txt
Billy
Bob
Chase
Jon
Jonathan
$ cat birthdates.txt
09/21/1992
08/12/1982
05/24/1999
04/23/1974
08/09/2001
$ paste -d ',' names.txt birthdates.txt
Billy,09/21/1992
Bob,08/12/1982
Chase,05/24/1999
Jon,04/23/1974
Jonathan,08/09/2001

If there are an unequal number of fields (5 rows in names.txt and only 3 in birthdates.txt), the bottom two names will not be matched with anything.

$ paste -d ',' 5names.txt 3birthdates.txt
Billy,05/24/1999
Bob,04/23/1974
Chase,08/09/2001
Jon,
Jonathan,

Join

If you're familiar with a relational databases (and don't worry if you're not), the join command should sound very familiar. In short, join takes a common column between two tables, and joins them together based on that attribute.

-t
Specify a delimiter
-1 n
Use the nth column as the join key for the first column.
-2 n
Use the nth column as the join key for the second column.
-a n
Also print the unprintable lines from n, where n is 1 or 2 (first or second file).

Basic joining

Let's try a join operation as an example.

$ cat birthdates.txt
05/24/1999,4
04/23/1974,2
08/09/2001,5
11/24/1991,3
01/23/1975,1
$ cat names.txt
Billy,1
Bob,2
Chase,3
Jon,4
Jonathan,5

First, we must have both lists sorted according to the column we want to join on. names.txt is already sorted, but birthdates.txt is not. Refer to the sorting page to learn how the sort operation works.

$ sort -t ',' -k 2 birthdates.txt > sortedBirthdates.txt
$ join -t ',' -1 2 -2 2 names.txt sortedBirthdates.txt
1,Billy,01/23/1975
2,Bob,04/23/1974
3,Chase,11/24/1991
4,Jon,05/24/1999
5,Jonathan,08/09/2001

Right/Left outer join

In some cases, you'll want to join two tables even though there are some rows without a corresponding value in the other row. Joining the right table with missing corresponding values is called a right outer join, and joining the left table with missing corresponding rows is called a left outer join.

A left outer join would have the option -a1 and a right outer join would have option -a2.

Full outer join

In a full outer join, both table rows are included, even if they don't have a corresponding row. Expect many null cell values when using this option.

Use the -a option for a full outer join.

Take your Linux skills to the next level!

System Admin Handbook

Take your Linux skills to the next level! Try Linux & UNIX

This book approaches system administration in a practical way and is an invaluable reference for both new administrators and experienced professionals. It details best practices for every facet of system administration, including storage management, network design and administration, email, web hosting, scripting, and much more.

$ Check price
74.9974.99Amazon 4.5 logo(142+ reviews)

More Linux & UNIX resources

Aching back from coding all day?

Foam Seat Cushion

Aching back from coding all day? Try Back Problems

This foam seat cushion relieves lowerback pain, numbness and pressure sores by promoting healthy weight distribution, posture and spine alignment. Furthermore, it reduces pressure on the tailbone and hip bones while sitting. Perfect for sitting on the computer desk for long periods of time.

$ Check price
99.9599.95Amazon 4.5 logo(9,445+ reviews)

More Back Problems resources

Ad