hpr2330 :: Awk Part 7

Looping in Awk explained by a sleep-deprived host

Hosted by Mr. Young on Friday, 2017-07-07 is flagged as Clean and is released under a CC-BY-SA license.
Tags: bash, linux, awk. Comments: (Be the first).

Listen in ogg, opus, or mp3 format. Play now:

Duration: 00:21:11
Download the transcription and subtitles.

Part of the series: Learning Awk.

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

In this episode, I will (very) briefly go over loops in the Awk programming language. Loops are useful when you want to run the same command(s) on a collection of data or when you just want to repeat the same commands many times.

When using loops, a command or group of commands is repeated until a condition (or many) is met.

While Loop

Here is a silly example of a while loop:

#!/bin/awk -f
BEGIN {

# Print the squares from 1 to 10 the first way

    i=1;
    while (i <= 10) {
        print "The square of ", i, " is ", i*i;
        i = i+1;
    }

exit;
}

Our condition is set in the braces after the while statement. We set a variable, i, before entering the loop, then increment i inside of the loop. If you forget to make a way to meet the condition, the while will go on forever.

Do While Loop

Here is an equally silly example of a do while loop:

#!/bin/awk -f
BEGIN {

    i=2;
    do {
        print "The square of ", i, " is ", i*i;
        i = i + 1
    }

    while (i != 2)

exit;
}

Here, the commands in the do code block are executed at the start, then the looping begins.

For Loop

Another silly example of a for loop:

#!/bin/awk -f
BEGIN {

    for (i=1; i <= 10; i++) {
        print "The square of ", i, " is ", i*i;
    }

exit;
}

As you can see, we set the variable, set the condition and set the increment method all in the braces after the for statement.

For Loop Over Arrays

Here is a more useful example of a for loop. Here, we are adding the different values of column 2 into an array/hash-table called a. After processing the file, we print the different values.

For file.txt:

name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5

Using the awk file of:

NR != 1 {
    a[$2]++
}
END {
    for (b in a) {
        print b
    }
}

We get the results of:

brown
purple
red
yellow
green

In another example, we do a similar process. This time, not only do we store all the distinct values of the second column, we perform a sum operation on column 3 for each distinct value of column 2.

For file.csv:

name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5

Using the awk file of:

BEGIN {
    FS=",";
    OFS=",";
    print "color,sum";
}
NR != 1 {
    a[$2]+=$3;
}
END {
    for (b in a) {
        print b, a[b]
    }
}

We get the results of:

color,sum
brown,13
purple,12
red,7
yellow,11
green,8

As you can see, we are also printing a header column prior to processing the file using the BEGIN code block.

Comments

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
Who is the host of this show?
What does HPR mean to you?