hpr1501 :: AWK
A cursory introduction to the AWK programming language
Hosted by laindir on Monday, 2014-05-05 is flagged as Clean and is released under a CC-BY-SA license.
AWK, text processing, rule, pattern, action, regular expression.
(Be the first).
The show is available on the Internet Archive at: https://archive.org/details/hpr1501
Listen in ogg,
spx,
or mp3 format. Play now:
Duration: 00:19:25
Programming 101.
A series focusing on concepts and the basics of programming
First of all, a correction. In the podcast, I mistakenly refer to one of the coauthors of the language as Kevin Weinberger. My humblest apologies to Mr. Weinberger, whose actual first name is Peter. I also neglected to mention one of AWK's most interesting features: its automatic field splitting. I hope to submit a followup podcast soon in order to rectify these two glaring mistakes.
AWK is a loosely typed interpreted programming language. Many useful functions in a UNIX programming environment, such as reading files, looping over input, matching regular expressions, and splitting strings into fields have been abstracted and are presented to the programmer as native parts of the language. This makes AWK ideal for text processing.
The basic structure of an AWK program is a list of rules. Each rule is made up of an optional pattern and an optional action. If the pattern is matched, the corresponding action is run. When AWK starts up, it loads the supplied program text, runs any rules with the special BEGIN pattern, then in turn, opens each file supplied on the command line (or stdin if no files or a - are specified). Each file is split into records based on the value in the RS (record separator) variable. AWK then loops through each record, splits it into fields based on the value in the FS (field separator) variable, and loops through each rule in the program. An empty pattern matches all records, so actions with no pattern run for every record. An empty action causes the current record to be printed.
The operator most unique to AWK is the $ (field access) operator. When followed by an integer literal or variable holding an integer value, it returns the corresponding field in the current record (counting from 1 up to NF, the number of fields special variable). $0 returns the entire record. If the supplied integer is greater than NF, it is treated as an uninitialized variable, which, in AWK, is treated dually as either the empty string, or the number 0, depending on the context in which it is referenced.
The most common type of pattern used in AWK (excepting, perhaps, the empty pattern) is a regular expression literal. It consists of a regular expression enclosed in forward slashes. This syntax is inherited from ed, the standard text editor, and has been passed down all the way to javascript. In AWK, a regular expression literal, alone as a pattern, is shorthand for $0 ~ /regex/, where ~ is the regular expression match operator (the string $0, current record, matches the supplied regular expression).
- POSIX AWK: https://pubs.opengroup.org/onlinepubs/009696699/utilities/awk.html
- The AWK Programming Language: https://books.cat-v.org/computer-science/awk-programming-language/The_AWK_Programming_Language.pdf