awk – Unix antiperfect explained

There is a difference between the classic and the newer posix awk. Linux system most times have a posix awk version. Older unix implementation have classic awk by default . Posix awk is available as nawk. I will explicitly mention nawk if there is a difference.

Most common use from the command-line:

awk -F <char> ‘{print $1,$3}’ <file>: Print field 1 of each line in <file>. Use -F<char> to specify the field separator. White-space (tab or space) is default
awk -v VAR=<value> -f <scriptfile> <file>: Pass <value> to the script and assign it to VAR the AWK-code is in <scriptfile>

Inside AWK programs

Operators:

	++, —	Auto increment/decrement before or after <VAR>
	+=, -=, *= , /=, ^=	Increase, decrease, multiply, divide, exponentiation to assigned <VAR>
	!	Not
	^, *, /, %, +, –	Exponentiation, multiply, divide, remainder, add, subtract
	<, <=, >, >=	Smaller, smaller or equal, larger, larger or equal
	==, !=	equal, not equal
	&&, \|\|	And, Or

Field manipulation:

Input strings are split on the Field Separator (default the space). Each field is stored in a variable. The first field in $1, the second in $2, etc. The entire input string is in $0

in nawk a field can be replaced by assigning the new value to the field variable. If the field separator is not the space you have to se the output field separator (OFS) too.

This example replaces the sixth field of the first record of the file in $FILE with “new-value”, the file-seperator is the pipe symbol “|”.

nawk -F”|” ‘{
OFS=”|”
if ( NR == 1 ) {
$6 = “new-value”
print
} else {
print
}
}’ ${FILE}

String manipulation:

substr(<string>,<start>,<num>): From <sting> return <num> characters, starting from <start>
n=split(var,ARR,<fs>): Split var in array ARR, n holds the number of elements in ARR, <fs>
is the field separator, if not given the variable FS is used as field separator (default white space). Default action of awk for read line is NF=split(var,ARR,” “)
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
$i=ARR[i]
}
gsub(<regexp>,<string>,<variable>): Replace <regexp> with <string> in <variable>. Return number of replacements.; <variable> is modified.

Calculations:

awk ‘BEGIN {} /GW/ {GW+=$4} END {print GW/NR}’ <file>: Print average of field 4 for all records in <file> containing ‘GW’

Control statements:

If ( VAR == <value> ) { <stat> } else { <stat> }: The if, then, else construction
if (n) if (n != 0): If n is not equal to zero
(n<5)*2: if n < 5 this returns 10, else 0
for (var in ARR) { print ARR[var] }: Read all indexes of ARR in arbitrary order.
if (var in ARR) { print ARR[var] }: Check if index var is in ARR
if (ARR[var] == “” ): Check if index var has a value
for (init;test;incr) e.g. for (num=10;num<=100:num++) { stat }: Loop, start with init, as long as test is true, after the statements incr num
function NAME (par,par,..): Create a function. Parameters of the functions are local. All local
variables should be defined as parameter to avoid overwriting a global
variable. Overwriting a global variable on the other hand is a way to
return results of the function.
return <value>: End of function and give it a return value

Misc:

VAR+0: Force numeric interpretation of VAR
VAR””: Force string interpretation of VAR; Do an regexp match
ARR[“IND”]=”string”: Create array ARR and put “string” in index IND
delete ARR[var]: Delete index var from ARR

Program structure:

BEGIN {<pre-block>} {<body-block>} END {<post-block>}: Commands in :
<pre-block> are executed before the input-file is read.
<body-block> are applied to each line of the input-file.
<post-block> are executed after the input-file is read.

Redirection:

Awk sends it’s output to standard output (terminal) by default. You can write to other files by using > or >> in print commands like in shell. An awk-progrem can have a maximum of 10 files open at a time.

Leave a Reply Cancel reply