hpr2143 :: Gnu Awk - Part 3

In this episode, I go into more advanced topics for the awk tool.

Hosted by Mr. Young on Wednesday, 2016-10-19 is flagged as Clean and is released under a CC-BY-SA license.
Tags: awk, bash, linux. Comments: 2.

Listen in ogg, opus, or mp3 format. Play now:

Duration: 00:31:04
Download the transcription and subtitles.

Part of the series: Learning Awk.

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

Awk Part 3

Remember our file:

name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5

Replace Grep

As we saw in earlier episodes, we can use awk to filter for rows that match a pattern or text. If you know the grep command, you know that it does the same function, but has extended capabilities. For simple filter, you don't need to pipe grep outputs to awk. You can just filter in awk.

Logical Operators

You can use logical operators "and" and "or" represented as "&&" and "||", respectively. See example:

$2 == "purple" && $3 < 5 {print $1}

Here, we are selecting for color to to equal "purple" AND amount less than 5.

Next command

Say we want to flag every record in our file where the amount is greater than or equal to 8 with a '**'. Every record between 5 (inclusive) and 8, we want to flag with a '*'. We can use consecutive filter commands, but there affects will be additive. To remedy this, we can use the "next" command. This tells awk that after the action is taken, proceed to the next record. See the following example:

NR == 1 {
  print $0;
  next;
}

$3 >= 8 {
  printf "%s\t%s\n", $0, "**";
  next;
}

$3 >= 5 {
  printf "%s\t%s\n", $0, "*";
  next;
}

$3 < 5 {
  print $0;
}

End Command

The "BEGIN" and "END" commands allow you to do actions before and after awk does its actions. For instance, sometimes we want to evaluate all records, then print the cumulative results. In this example, we pipe the output of the df command into awk. Our command is:

df -l | awk -f end.awk

Our awk file looks like this:

$1 != "tmpfs" {
    used += $3;
    available += $4;
}

END {
    printf "%d GiB used\n%d GiB available\n", used/2^20, available/2^20;
}

Here, we are setting two variables, "used" and "available". We add the records in the respective columns all together, then we print the totals.

In the next example, we create a distinct list of colors from our file:

NR != 1 {
    a[$2]++
}
END {
    for (b in a) {
        print b
    }
}

This is a more advanced script. The details of which, we will get into in future episodes.

BEGIN command

Like stated above, the begin command lets us print and set variables before the awk command starts. For instance, we can set the input and output field separators inside our awk file as follows:

BEGIN {
    FS=",";
    OFS=",";
    print "color,count";
}
NR != 1 {
    a[$2]+=1;
}
END {
    for (b in a) {
        print b, a[b]
    }
}

In this example, we are finding the distinct count of colors in our csv file, and format the output in csv format as well. We will get into the details of how this script works in future episodes.

For another example, instead of distinct count, we can get the sum of the amount column grouped by color:

BEGIN {
    FS=",";
    OFS=",";
    print "color,sum";
}
NR != 1 {
    a[$2]+=$3;
}
END {
    for (b in a) {
        print b, a[b]
    }
}

Comments

Comment #1 posted on 2016-10-26 03:01:22 by Bambiker

I cobbled the following together from what I learned in part 2. Maybe there's an easier way.
vim `grep -ri "tpl_header" * | awk -F ":" '{print $1}'`

It opens every file found in vim when grep finds the text "tpl_header" without quotes in the text. In vim, use :bn to hop to the next file and edit as you like.

grep -ri looks through every file and directory under the current directory disregarding the case of the search text. The * matches any file. I'm in bash, so it may work differently in other shells.

Comment #2 posted on 2016-10-26 14:13:47 by Dave Morriss

grep and awk

I'd skip the awk part here. My solution would be:

vim $(grep -ril "tpl_header" *)

The -l option to grep just returns the filename where a match occurred, so there's no need to use awk to separate it out from what grep returns.

In my case I usually keep vim backup files in the same directory so I'd change '*' to '*[^~]' to omit those.

As an aside I prefer $() to back-ticks since they are more visible and (I think) nest better.

There are times when grep is unnecessary because awk can do the same job, but this isn't one. Quite the reverse!

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
Who is the host of this show?
What does HPR mean to you?

HPR

Hacker Public Radio

https://HackerPublicRadio.org

The Community Podcast

Sharing your ideas, projects, opinions since 2005

New episodes every weekday