Using Grep Command in CentOS for Text Matching

Grep (short for GlobalRegular Expression Print) is a command that is used extensively to as a text search tool in text files. It searches for a pattern in a file and prints the corresponding line, which contains the matching pattern. Itscans files for specified patterns and can be used with regular expressions, as well as text strings.Its syntax is as follows:

$ grep [options] pattern [files]

The following table demonstrates when the grep command is used:

Command

Usage

grep 'student' /etc/passwd

Search for a string,student, in a file,/etc/passwd, and print all matching lines

grep -v 'student' /etc/passwd

Print all lines that do not contain the stringstudent

grep -i 'STUDENT' /etc/passwd

Search for a string,STUDENT, in a case-insensitive manner and print all matching lines (-iignore case)

grep -c ‘student’ /etc/passwd

Print the total number of lines that contain the textstudentin the /etc/passwdfile

grep -rl 'student' /etc/

Search the directory recursively and print the filenames that have the stringstudent

grep -rL ‘student’ /etc/

Search the directory recursively and print the filenames that don’t have the stringstudent

grep -n 'student' /etc/passwd

Print the line number, along with the line containing the patternstudent

grep -A1 'student' /etc/passwd

Print an additional one line after the match

grep -B1 'student' /etc/passwd

Print an additional one line before the match

grep -C1 'student' /etc/passwd

Print an additional one line after, and one line before, the match

grep -a 'dir' /bin/mkdir

Search inside the/bin/mkdir binary file and print the line containing the stringdir

grep 'root' /etc/passwd

Print the line containing the stringrootanywhere on a line

grep '^root' /etc/passwd

Print the line that begins with the stringroot

grep 'bash$' /etc/passwd

Print the line that ends with the stringbash

grep '^$' <filename>

Print the empty lines from the file

grep -v '^$' <filename>

Print only non-empty lines from the file

grep '[br]oot' /etc/passwd

Print the lines that contain either string beginning with the characters borr, and followed by the string oot, anywhere on a line in the /etc/passwdfile

who | grep 'student'

Print the line containing the stringstudentby reading input fromstdin

An example of matching a string in a file using grep is shown in the following screenshot:

An example of printing those lines that do not contain the specified string using grep is shown in the following screenshot (some output stripped):

Thegrepcommand can be used with the -c option to count the occurrence of a specified pattern. The following example shows how to count the number of CPU cores in a system usinggrepcommand:

$ grep -c name /proc/cpuinfo (count the number of cpu cores in system)

The following screenshot shows how to use grep command to count the occurrence of root string in the /etc/passwd file:

An example of printing the line number, along with the matching lines using the grep, is shown in the following screenshot:

An example of printing the lines that begin with a specified string is shown in the following screenshot:

Text extraction using sed and awk

It is very often necessary to extract the same text repeatedly from a file. For such an operation, where we need to edit a file at the same place, or extract the same text from multiple files, we use sed and awk. There are multiple text extraction utilities. However, these utilities use fewer system resources, execute faster, and are simpler to use.

sed

This is one of the oldest and most popular Unix text processing tools. It is a non-interactive stream editor. Itis typically used forfiltering text, as well as performingtextsubstitutionand the non-interactive editing of text files.There are two main ways of invoking the sed command, as follows:

sed -e command <filename>: Specify editing commands at the command line, operate on the filename specified, and display the output on the Terminal. Here, the-ecommand option allows us to specify multiple editing commands simultaneously at the command line.
sed -f scriptfile <filename>: Specify a script file containingsedcommands to operate on a specified filename and display the output on the Terminal.

Now, we discuss the most popular operations performed using sed, for example, substitution. The following table lists the basic syntax for substitution operations:

Command

Usage

sed 's/original_string/new_string/s file

Substitute the first occurrence of the original string in each line with a new string

sed 's/original_string/new_string/g' file

Substitute all occurrences of the original string in each line with a new string specified

sed '1,3s/original_string/new_string/g' file

Substitute all occurrences of the original string in each line with a new string from line one to line three in the same file

sed -i 's/original_string/new_string/g' file

Substitute all occurrences of the original string with a new string in each line in the same file

Using the sed utility with the print command:

The p command will print the matching lines and the -noption suppresses standard output so that only matching lines are displayed, as shown in the following example:

$ sed -n '1,3' /etc/passwd
$ sed -n '/^root/' /etc/passwd

Using the sedutility with the substitutecommand:

The s command will replace the matching string with a new string. The s option can be prefixed with a range to restrict the replacement to a specified number of lines, as shown in the following example:

$ sed '/^student/s/bash/sh/' /etc/passwd

Using the sedutility with delete command:

In the following example, the sed d command will delete the empty and commented lines from ntp.conf and create a backup file of ntp.conf with the extension backup as ntp.conf.backup, as shown in the following command line:

$ sed -i.backup '/^#/d;/^$/d' /etc/ntp.conf

Note:

Use the -i option with caution, because the changes, once made inside the file, are not reversible. It is always a better way to first use sed without the -i option and then redirect the output to a new file.

awk

The awk command is used to extract data from a file and print specific contents. It is quite often used to restructure the data and construct reports. Its name is derived from the last names of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. Its main features include the following:

It is an interpreted programming language similar to C
It is used for data manipulation in files, and for retrieving and processing text from files
It views files as records and fields
It has arithmetic and string operators
It has variables, conditional statements, and loops
It reads from a file or from a standard input device and outputs to a standard output device such as a Terminal

Its general invoking syntax is as follows:

$ awk   '/pattern/{command}'   <filename>

The printing of a selected column or row from a file is the basic task generally performed usingawk.

In the following example, the awkcommand is used to print the contents of a file line by line until the end of the file is reached:

$ awk '{ print $0}' /etc/passwd

In the following example, awk command is used to print the first field (column) of the line containing the username student. Here -F option is usedto set the field separator as :

$ awk -F:  '/student/{ print $1}' /etc/passwd

In the following example, the awk command is used to print selective fields from the line containing the matching pattern in file /etc/passwd:

$ awk -F: '/student/{print "Username :", $1, "Shell :", $7}' /etc/passwd