Predefined awk Variables

awk provides a number of predefined variables that simplify access and processing of the data file.

The following examples make use of awk syntax and commands not yet covered. I have provided some description in these cases, but will cover these topics in detail in later modules.

FILENAME

Contains the name of the current file being processed. When invoking awk, you may list several filenames as data sources. As each file is opened, FILENAME will be updated.

FS

Field separator. FS indicates the character[s] to use as field separators.

In the following exercise , we use : as a field separator and show 5th field of the passlist file, a shortened version of the password file. The 5th field is the Gecos field which usually holds the user's real name.

The BEGIN pattern allows us to set FS before the 1st record is read. Using FS, we indicate that each colon is to be treated as a unique field separator. As a result, when we print the 5th field, an empty Gecos fields will be displayed as empty.

Note that we could also use the option -F":" to initialize FS from the command line.

awk 'BEGIN { FS=":";}; { print $5; }' ~berezin/Data/passlist

Regular expression syntax may be used if alternative delimiters or repetition of a delimiter as a single delimiter needs to be recognized.

In the following exercise, we treat one or more consecutive colons as a single delimiter.

awk 'BEGIN { FS=":+";}; { print $5 }' ~berezin/Data/passlist

In this second exercise, we retrieved the home directory rather than the contents of the Gecos field where the Gecos field was empty (two colons in a row). In this case, it is probably not what we want, so the 1st example is the preferred command.

The exception to this is the space. If FS is set to a single space, then one or more contiguous spaces and/or tabs will be treated as a single delimiter.

We will look at a way around this later.

NF

Number of fields. NF contains the number of fields of the current record based on the specified or assumed field delimiter. Each time a new record is read, this is reset. Depending on the version of awk, the number of fields for the last record read may or may not be available after the last record is read.

If you need to remember largest number of fields read in, you will have to write some code to save the contents of NF at its largest to another variable.

NR

Number of records. NR contains the total number of records read so far. If more than one data file specified, this value will be an accumulated value from all files read so far. This also acts as the current line number index.

FNR

File number of records. FNR contains the number of records read so far for the current data file being processed. If you have listed more than one file as data input, this will be reset each time another file is opened for processing.

RS

Record Separator. RS specifies the separator between each record in the data file. Normally, this will be new-line. However, you may change this.

For example :

mytext=data&pass=password&mysel=down&oked=on&choice=one&bdoit=button

represents the information returned from a form style web-page. This represents 6 different pieces of data returned as field/value pairs. Each pair is delimited by an ampersand and the elements of the field/value pair are separated by an equal.

By setting RS to &, awk will parse the input into several records consisting of field/value pairs.

OFS

Output field separator. OFS specifies the separator used between fields when print is used to generate output.

ORS

Output record separator. ORS specifies the separator used between records when print is used to generate output. OFS and ORS uses literals and may consist of several characters. Also, OFS and ORS do not affect the output of the printf command.

$0

$0 represents all of the current record being processed.

$1-$n

$n represents the particular field position on the line. If the position references does not exist, a null is used.

We will look at additional predefined variables as various topics come up.