awkis aprogram that has its own programming languageforperforming data-processing and generating reports.
The GNU version ofawkisgawk.
awkprocesses data, which can be received from a standard input, input file, or as the output of any other command or process.
awkprocesses data similar tosed, line by line. It processes every line for the specified pattern and performs specified actions. If the pattern is specified, then all the lines containing specified patterns will be displayed. Ifpatternis not specified, then the specified actions will be performed on all the lines.
The meaning of awk
The name of the program awk is made from the initials of the three authors of the language, namely Alfred Aho, Peter Weinberger, and Brian Kernighan. It is not very clear why they selected the name awk instead of kaw or wak!
Using awk
The following are different ways to use awk:
- Syntax while using only
pattern:
$ awk 'pattern' filename
- In this case, all the lines containing
patternwill be printed.
- Syntax using only
action:
$ awk '{action}' filename
- In this case,
actionwill be applied to all lines.
- Syntax using
patternandaction:
$ awk 'pattern {action}' filename
- In this case,
actionwill be applied on all the lines containingpattern.
As seen previously, theawkinstruction consists of patterns, actions, or a combination of both.
Actions will be enclosed in curly brackets. Actions can contain many statements separated by a semicolon or a newline.
awkcommands can be on the command line or in theawkscript file. The input lines could be received from a keyboard, pipe, or file.
Input from files
Let’s see a few examples of using the preceding syntax using input from files:
$ cat people.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
Enter the following command:
$ awk '/Martin/' people.txt
The output is as follows:
Output:
Fred Martin 6500 22/7/1982
This prints a line containing theMartinpattern.
Here is an example:
$ cat people.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
Enter the following command:
$ awk '{print $1}' people.txt
The output is as follows:
Output:
BillFredJulieMarieTom
This awk command prints the first field of all the lines from the people.txt file:
$ cat people.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
Here is an example:
$ awk '/Martin/{print $1, $2}' people.txtFred Martin
This prints the first and second field of the line that contains the Martin pattern.
Input from commands
We can usetheoutput of any other Linux command as an input to theawkprogram. We need to use the pipe to send an output of another command as the input to the awk program.
The syntax is as follows:
$ command | awk 'pattern'$ command | awk '{action}'$ command | awk 'pattern {action}'
Here is an example:
$ cat people.txt | awk '$3 > 6500'
The output is as follows:
Bill Thomas 8000 08/9/1968Tom Walker 7000 14/1/1977
This prints all lines where field3is greater than6500.
Here is an example:
$ cat people.txt | awk '/1972$/{print $1, $2}'
The output is as follows:
Output:
Marie Jones
This prints fields 1 and 2 of the lines that ends with the 1972 pattern:
$ cat people.txt | awk '$3 > 6500 {print $1, $2}'
This prints fields 1 and 2 of the lines where the third field is greater than 6500
How awk works
Let’s understand how the awk program processes every line. We will consider a simple file, sample.txt:
$ cat sample.txtHappy Birth DayWe should live every day.
Let’s consider the following awk command:
$ awk '{print $1, $3}' sample.txt
The following diagram shows how awk will process every line in memory:
An explanation of the preceding diagram is as follows:
awkreads a line from the file and puts it into an internal variable called$0. Each line is called a record. By default, every line is terminated by a new line.- Then, every record or line is divided into separate words or fields. Every word is stored in numbered variables
$1,$2, and so on. There can be as many as100fields per record. - awk has an internal variable calledinternal field separator(IFS ). IFS is normally whitespace. Whitespace includes tabs and spaces. The fieldswillbe separated by IFS. If we want to specify any other IFS, such as colon (
:) in the/etc/passwdfile, then we will need to specify it in theawkcommand line.
When awk checks an action as '{print $1, $3}', it tells awk to print the first and third fields. Fields will be separated by a space. The command is as follows:
$ awk '{print $1, $3}' sample.txt
The output will be as follows:
Output:
Happy DayWe live
An explanation of the output is as follows:
- There is one more internalvariablecalledOutput Field Separator(OFS). This is normally space. This will be used for separating fields while printing as output.
- Once the first line is processed,
awkloads the next line in$0and it continues as discussed earlier.
awk commands from within a file
We can putawkcommands in a file. We willneedto use the-foption before using theawkscript filename to use theawkscript file for all processing instructions.awkwill copy the first line from the data file to be processed in$0, and then it will apply all processing instructions on that record. Then, it will discard that record and load the next line from the data file. This way, it will proceed till the last line of the data file. If the action is not specified, the pattern-matching lines will be printed on screen. If the pattern is not specified, then the specified action will be performed on all lines of the data file.
This is an example:
$ cat people.txtBill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ cat awk_script /Martin/{print $1, $2}
Enter the following command:
$ awk -f awk_script people.txt
The output is as follows:
Output:
Fred Martin
The awk command file contains the Martin pattern and it specifies the action of printing fields 1 and 2 of the line, matching the pattern. Therefore, it has printed the first and second fields of the line containing the Martin pattern.
Records and fields
Every line terminated by the new line is called a record and every word separated by a whitespace is called a field. We will learn more about them in this section.
Records
awkdoes not see the file asonecontinuous stream of data; it processes the file line by line. Each line is terminated by a newline character. It copies each line in an internal buffer, called a record.
The record separator
By default, a new line or carriage return is an input record separator and output record separator. The input record separator is stored in the built-in variable RS, and the output record separator is stored in ORS. We can modify the ORS and RS, if required.
The $0 variable
The entirelinethat is copied into the buffer, such as a record, is called$0.
Take the following command, for example:
$ cat people.txt
The output is will be as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ awk '{print $0}' people.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
This has printed all the lines of the text file. Similar results can be seen with the following command:
$ awk '{print}' people.txt
The NR variable
awk has a built-in variable calledNR. It stores therecordnumber. Initially, the value stored in NR is1. Then, it will be incremented by one for each new record.
Take, for example, the following command:
$ cat people.txt
The output will be:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ awk '{print NR, $0}' people.txtThe output will be:1 Bill Thomas 8000 08/9/19682 Fred Martin 6500 22/7/19823 Julie Moore 4500 25/2/19784 Marie Jones 6000 05/8/19725 Tom Walker 7000 14/1/1977
This has printed every record, such as $0 with a record number, which is stored in NR. That is why we see 1, 2, 3, and so on before every line of output.
Fields
Every line is called a record, and every word in a record is called a field. By default, words or fields are separated by whitespace, that is, Space or Tab. awk has an internal built-in variable called NF, which will keep track of field numbers. Typically, the maximum field number will be 100 and will depend on implementation. The following example has five records and four fields.
$1 $2 $3 $4Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ awk '{print NR, $1, $2, $4}' people.txt
The output will be:
Output:
1 Bill Thomas 08/9/19682 Fred Martin 22/7/19823 Julie Moore 25/2/19784 Marie Jones 05/8/19725 Tom Walker 14/1/1977
This has printed the record number and field numbers 1, 2, and so on, on the screen.
Field separators
Every word is separated by whitespace. We will learn more about them in this section.
The input field separator
We have already discussed that an inputfieldseparator is a whitespace by default. We can change this IFS to other values on the command line or by using theBEGINstatement. We need to use the-Foption to change the IFS.
This is an example:
$ cat people.txt
The output will be as follows:
Output:
Bill Thomas:8000:08/9/1968Fred Martin:6500:22/7/1982Julie Moore:4500:25/2/1978Marie Jones:6000:05/8/1972Tom Walker:7000:14/1/1977$ awk -F: '/Marie/{print $1, $2}' people.txt
The output will be as follows:
Output:
Marie Jones 6000
We have used the -F option to specify colon (:) as IFS instead of the default, IFS. Therefore, it has printed field 1 and 2 of the records in which the Marie pattern was matched. We can even specify more than one IFS on the command line as follows:
$ awk -F'[ :t]' '{print $1, $2, $3}' people.txt
This will use Space, colon, and Tab characters as the inter field separator or IFS.
Patterns and actions
While executing commands using awk, we need to define patterns and actions. Let’s learn more about them in this section.
Patterns
awkuses patterns to controltheprocessing of actions. When a pattern or regular expression is found in the record, an action is performed, or if no action is defined thenawksimply prints the line on the screen.
This is an example:
$ cat people.txt
The output will be:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ awk '/Bill/' people.txt
The output will be:
Output:
Bill Thomas 8000 08/9/1968
In this example, when the Bill pattern is found in the record, that record is printed on screen:
$ awk '$3 > 5000' people.txt
The output will be:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
In this example, when field 3 is greater than 5000, that record is printed on the screen.
Actions
Actions are performed whentherequired pattern is found in a record. Actions are enclosed in curly brackets ({and}). We can specify different commands in the same curly brackets, but those should be separated by a semicolon.
The syntax is as follows:
pattern{ action statement; action statement; .. }
or
pattern
{ action statement
action statement
}
The following example gives a better idea:
$ awk '/Bill/{print $1, $2 ", Happy Birth Day !"}' people.txt
This is the output:
Output:
Bill Thomas, Happy Birth Day !
Whenever a record contains the Bill pattern, awk performs the action of printing field 1, field 2, and prints the message Happy Birth Day
Regular expressions
A regular expression is a patternenclosedin forward slashes. A regular expression can contain meta-characters. If the pattern matches any string in the record, then the condition is true and any associated action, if mentioned, will be executed. If no action is specified, then the record is simply printed on the screen.
Meta-characters used inawkregular expressions are as follows:
Meta-character
What it does
.
A single character is matched
*
Zero or more characters are matched
^
The beginning of the string is matched
$
The end of the string is matched
+
One or more of the characters are matched
?
Zero or one of the characters are matched
[ABC]
Any one character in the set of charactersA,B, orCis matched
[^ABC]
Any one character not in the set of charactersA,B, orCis matched
[A-Z]
Any one character in the range fromAtoZis matched
a|b
Eitheraorbis matched
(AB)+
One or more sets ofAB; such asAB,ABAB, and so on is matched
*
A literal asterisk is matched
&
This is used to represent the replacement string when it is found in the search string
In the following example, all lines containing the regular expression Moore will be searched and the matching record’s field 1 and 2 will be displayed on the screen:
$ awk '/Moore/{print $1, $2}' people.txt
The output is as follows:
Output:
Julie Moore
Writing the awk script file
Whenever we need to write multiplepatternsand actions in a statement, then it is more convenient to write a script file. The script file will contain patterns and actions. If multiple commands are on the same line, then those should be separated by a semicolon; otherwise, we need to write them on separate lines. The comment line will start by using the pound (#) sign.
Here is an example:
$ cat people.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977
The awkscript
$ cat report
The output is as follows:
Output:
/Bill/{print "Birth date of " $1, $2 " is " $4}
/^Julie/{print $1, $2 " has a salary of $" $3 "."}
/Marie/{print NR, $0}
Enter the following command:
$ awk -f report people.txt
The output will be as follows:
Output:
Birth date of Bill Thomas is 08/9/1968Julie Moore has a salary of $4500.4 Marie Jones 6000 05/8/1972
In this example, theawkcommand is followed by the-foption, which specifies the script file as a record and then processes all the commands in the text file,people.txt.
In this script, the regular expressionBillis matched, then we print text, field 1, field 2, and then the birth date information. If the regular expressionJulieis matched at the start of the line, then print her salary information. If the regular expressionMarieis matched, then print the record number NR and print the complete record.
Using variables in awk
We can simply declare a variable in theawkscript, evenwithoutany initialization. Variables can be of type string, number, floating type, and so on. There is no type declaration required such as in C programming.awkwill find out the type of variable by its right-hand side data type during initialization or its usage in the script.
Uninitialized variables will have the value0or strings will have a valuenullsuch as"", depending on how it is used inside scripts:
name = "Ganesh"
The variable name is of the string type:
j++
The variable j is a number. Variable j is initialized to zero and it is incremented by one:
value = 50
Thevaluevariable is a number with an initial value of50.
The technique to modify the string type variable to the number type is as follows:
name + 0
The technique to modify the number type variable to the string type is as follows:
value " "
User-defined variables can be made up of letters, digits, and underscores. The variable cannot start with a digit.
Decision-making using an if statement
In awk programming, the if statement is used for decision-making. The syntax is as follows:
if (conditional-expression)
action1
else
action2
If the condition is true, thenaction1will be performed, elseaction2will be performed. This is very similar to C programmingifconstructs.
An example of using theifstatement in theawkcommand is as follows:
$ cat person.txt
The output is as follows:
Output:
Bill Thomas 8000 08/9/1968Fred Martin 6500 22/7/1982Julie Moore 4500 25/2/1978Marie Jones 6000 05/8/1972Tom Walker 7000 14/1/1977$ awk '{if ($3 > 7000) { print "person with salary more than 7000 is n", $1, " " , $2;}}' people.txt
The output is as follows:
Output:
person with salary more than 7000 isBill Thomas
In this example, field 3 is checked to see whether it is greater than 7000 for any record. If field 3 is greater than 7000 for any record, then the action of printing the name of the person and value of the third record will be done.
Using the for loop
The for loop is used for doing certain actions repetitively. The syntax is as follows:
for(initialization; condition; increment/decrement)
actions
Initially, a variable is initialized then the condition is checked. If it is true, then the action or actions enclosed in curly brackets are performed. Then, the variable is incremented or decremented. Again, the condition is checked. If the condition is true, then actions are performed; otherwise, the loop is terminated.
An example of theawkcommand with theforloop is as follows:
$ awk '{ for( i = 1; i <= NF; i++) print NF,$i }' people.txt
Initially, the i variable is initialized to 1. Then, the condition is checked to see whether i is less than NF. If true, then the action of printing NF and the field is performed. Then i is incremented by one. Again, the condition is checked to see whether it is true or false. If true, then it will perform actions again; otherwise, it will terminate the looping activity.
Using the while loop
Similar to C programming,awkhas awhileloop fordoingtasks repeatedly.whilewill check for the condition. If the condition is true, then actions will be performed. If a condition is false, then itwillterminate the loop.
The syntax is as follows:
while(condition)
actions
An example of using the while construct in awk is as follows:
$ cat people.txt$ awk '{ i = 1; while ( i <= NF ) { print NF, $i ; i++ } }' people.txt
NF is the number of fields in the record. The variable i is initialized to 1. Then, while i is smaller or equal to NF, the print action will be performed. The print command will print fields from the record from the people.txt file. In the action block, i is incremented by one. The while construct will perform the action repeatedly until i is less than or equal to NF
Using the do while loop
Thedo whileloop is similar to thewhileloop; but the difference is, evenifthe condition is true, atleastonce the action will be performed unlike thewhileloop.
The syntax is as follows:
do
action
while (condition)
After the action or actions are performed, the condition is checked again. If the condition is true, then the action will be performed again; otherwise, the loop will be terminated.
The following is an example of using thedo whileloop:
$ cat awk_scriptBEGIN { do { ++x print x } while ( x <= 4 )}$ awk -f awk_script12345
In this example, x is incremented to 1 and the value of x is printed. Then, the condition is checked to see whether x is less than or equal to 4. If the condition is true, then the action is performed again.
