hpr2330 :: Awk Part 7
Looping in Awk explained by a sleep-deprived host
Hosted by Mr. Young on Friday, 2017-07-07 is flagged as Clean and is released under a CC-BY-SA license.
bash, linux, awk.
(Be the first).
The show is available on the Internet Archive at: https://archive.org/details/hpr2330
Listen in ogg,
spx,
or mp3 format. Play now:
Duration: 00:21:11
Learning Awk.
Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.
In this episode, I will (very) briefly go over loops in the Awk programming language. Loops are useful when you want to run the same command(s) on a collection of data or when you just want to repeat the same commands many times.
When using loops, a command or group of commands is repeated until a condition (or many) is met.
While Loop
Here is a silly example of a while loop:
#!/bin/awk -f
BEGIN {
# Print the squares from 1 to 10 the first way
i=1;
while (i <= 10) {
print "The square of ", i, " is ", i*i;
i = i+1;
}
exit;
}
Our condition is set in the braces after the while
statement. We set a variable, i, before entering the loop, then increment i inside of the loop. If you forget to make a way to meet the condition, the while will go on forever.
Do While Loop
Here is an equally silly example of a do while
loop:
#!/bin/awk -f
BEGIN {
i=2;
do {
print "The square of ", i, " is ", i*i;
i = i + 1
}
while (i != 2)
exit;
}
Here, the commands in the do
code block are executed at the start, then the looping begins.
For Loop
Another silly example of a for
loop:
#!/bin/awk -f
BEGIN {
for (i=1; i <= 10; i++) {
print "The square of ", i, " is ", i*i;
}
exit;
}
As you can see, we set the variable, set the condition and set the increment method all in the braces after the for
statement.
For Loop Over Arrays
Here is a more useful example of a for loop. Here, we are adding the different values of column 2 into an array/hash-table called a
. After processing the file, we print the different values.
For file.txt:
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
Using the awk file of:
NR != 1 {
a[$2]++
}
END {
for (b in a) {
print b
}
}
We get the results of:
brown
purple
red
yellow
green
In another example, we do a similar process. This time, not only do we store all the distinct values of the second column, we perform a sum operation on column 3 for each distinct value of column 2.
For file.csv:
name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5
Using the awk file of:
BEGIN {
FS=",";
OFS=",";
print "color,sum";
}
NR != 1 {
a[$2]+=$3;
}
END {
for (b in a) {
print b, a[b]
}
}
We get the results of:
color,sum
brown,13
purple,12
red,7
yellow,11
green,8
As you can see, we are also printing a header column prior to processing the file using the BEGIN
code block.