Structure of an awk command.

Example :

The following reads each line or record from the passwd file. If the record starts with a student id, display the 1st (user login id) and 3rd (user_id) fields of the record.

mawk -F: '/^z[0-9]+$/ { print $1, $3; }' /etc/passwd

The general structure of an awk command is :

pattern { action cmds }

An awk command may have just a pattern and will act like the grep command. With no action specified, awk will assume a default action of a simple print of the record for which the pattern is matched.

An awk command may have just action commands which will be applied to all lines. The specified actions will be taken if appropriate.

Or it may have both, allowing you to target specific lines or even fields within a line for particular edit actions.


Pattern :

The pattern may be a regular expression, a conditional statement testing the record number or other property of the record of field in it, or the keywords BEGIN or END. The pattern may also be a single statement, a compound statement, or a range specification.

mawk uses the extended regular expressions.

If user specifically targets a field, then that field is considered a 'line'.
$1 ~ /regex/

If no field named, then regular expression is applied against whole line or record.

Regular expression :

mawk supports the egrep style meta-characters.

.
identifies a single character of any value, including newline.

^
identifies the beginning of a field or record.

$
identifies the end of a field or record.

[cde]
identifies a single character matching list of alternative values.

[a-f]
identifies a single character in the specified alternative values. Note : [:alpha:] not supported.

[^...]
identifies a single character excluding the values in the range or list specified.

r1|r2
bar - allows two alternative regular expressions to be specified. Either expression may be matched.

*
0 or more iterations of previous character.

+
1 or more iterations of previous character.

?
0 or 1 iteration of previous character.

()
Grouping. Used with the multipliers *, +, ? and {} to apply repetition to a group of characters.

Example :

The following specifies three separate regular expression patterns that may be matched. Because no action specified, lines are printed out. Note that if more than one pattern is matched, you will get duplicate lines.

mawk -F: '/^z91/; /^z10/; /^z15/' /etc/passwd | sort | less

Using numeric values. A literal numeric value will match the record or line number specified.

mawk -F: '1; 6; 11' /etc/passwd


Targeted fields.

Use tilde to specify a particular field to apply the pattern against.

$3 ~ /4375/ - matches any string in field 3 containing the sequence 4375.

$3 ~ /^4375$/ - exactly matches 4375 in field 3.


Compound patterns.

Patterns separated by a comma represent a starting and ending range.

/<pre>/,/<\/pre>/ - applies action on encountering the string <pre> and stops applying actions on encountering </pre>

Compound matches can also be specified.

&& - specifies an AND condition.

$1 ~ /^z[0-9]+$/ && $8 ~ /sshd/ - if the 1st field consists of a z-Id and field 8 contains the string sshd, then print line.

|| - specifies an OR condition.

$8 ~ /^nano/ || $8 ~ /^vi/ - if field 8 starts with either nano or vi, then print line.

Multiple compound conditions can be specified and && has higher precedence than ||.

Parentheses, (), can be used to modify this.

NR > 2 && NF > 4 || $1 ~ /^Comment$/
( NR > 2 && NF > 4 ) || $1 ~ /^Comment$/
says record number has to be greater than 2 and number of fields in line greater than 4 OR 1st field has to have just Comment in it.

This means that if any line including 1st or 2nd line start with Comment, they will print.

NR > 2 && ( NF > 4 || $1 ~ /^Comment$/)
says print line if record number is greater than 2
OR print line if number of fields is greater than 4
OR the 1st field is Comments.

! ( $1 ~ /^root$/ || $1 ~ /^spampd$/ )

( $1 !~ /^root$/ && $1 !~ /^spampd$/ )

Same statement, different forms (deMorgan).

1st one states, if the result is not either root or spampd in 1st field, then print.

2nd one states, if 1st field is not root and 1st field is not spampd, print.

Of interest, this also works :

$1 !~ /^root$|^spampd$/


mawk will allow the substitution of "" for the // in some cases.

$1 !~ "^root$|^spampd$" It is possible to store a regular expression pattern in a variable and use it.

Create an awk cmd file, firsts.awk with :

BEGIN { patmatch="^[a-f]"; }

{
  $1 ~ patmatch { print }
}

ps -ef | mawk -f firsts.awk

This will give you a list of processes whose owner's id starts with a, b, c, d, or e.

Note that the // is not required.