hpr2114 :: Gnu Awk - Part 1
An introduction the the awk text parsing tool
Hosted by Mr. Young on Thursday, 2016-09-08 is flagged as Explicit and is released under a CC-BY-SA license.
awk, bash, linux.
(Be the first).
The show is available on the Internet Archive at: https://archive.org/details/hpr2114
Listen in ogg,
spx,
or mp3 format. Play now:
Duration: 00:22:30
Learning Awk.
Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.
Introduction to Awk
Awk is a powerful text parsing tool for unix and unix-like systems.
The basic syntax is:
awk [options] 'pattern {action}' file
Here is a simple example file that we will be using, called file1.txt
:
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
First command:
awk '{print $2}' file1.txt
As you can see, the “print” command will display the whatever follows. In this case we are showing the second column using “$2”. This is intuitive. To display all columns, use “$0”.
This example will output:
color
red
yellow
red
purple
green
purple
brown
brown
yellow
Second command:
awk '$2=="yellow"{print $1}' file1.txt
This will output:
banana
pineapple
As you can see, the command matches items in column 2 matching “yellow”, but prints column 1.
Field separator
By default, awk uses white space as the file separator. You can change this by using the -F option. For instance, file1.csv
looks like this:
name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5
A similar command as before:
awk -F"," '$2=="yellow" {print $1}' file1.csv
will still output:
banana
pineapple
Regular expressions work as well:
awk '$2 ~ /p.+p/ {print $0}' file1.txt
This returns:
grape purple 10
plum purple 2
Numbers are interpreted automatically:
awk '$3>5 {print $1, $2}' file1.txt
Will output:
name color
banana yellow
grape purple
apple green
potato brown
Using output redirection, you can write your results to file. For example:
awk -F, '$3>5 {print $1, $2}' file1.csv > output.txt
This will output a file with the contents of the query.
Here’s a cool trick! You can automatically split a file into multiple files grouped by column. For example, if I want to split file1.txt into multiple files by color, here is the command.
awk '{print > $2".txt"}' file1.txt
This will produce files named yellow.txt, red.txt, etc. In upcoming episodes, we will show how to improve the outputs.
Resources
- https://www.theunixschool.com/p/awk-sed.html
- https://www.tecmint.com/category/awk-command/
- https://linux.die.net/man/1/awk
Coming up
- More options
- Built-in Variables
- Arithmetic operations
- Awk language and syntax