Linux: text parsing tools and techniques

Knowing how to parse text files in Linux is essential in order to troubleshoot and analyze log files, in this example i will show you…

Linux: text parsing tools and techniques
Photo by Sai Kiran Anagani on Unsplash

Knowing how to parse text files in Linux is essential in order to troubleshoot and analyze log files, in this example i will show you simple examples that will help you understand how to use these tools, note that i dont cover all tools neither whole functionality of the tools, just basic usage to help you start with!

Using cat and zcat

cat is used to print content of one or more files to the screen, zcat does the same but for compressed text files, its an essential command for log parsing, its output can be passed to the stdin of another command using the | symbol as we will see in other commands.

Examples

cat filename.txt - prints content of filename.txt 
cat file1.txt file2.txt - prints content of both files 
cat -n filename.txt - prints filename.txt with line numbers 
cat -b filename.txt - prints filename.txt with blank lines numbered 
cat -s filename.txt - prints filename.txt with multiple blank lines condensed into one 
cat -E filename.txt - prints filename.txt with a $ at the end of each line 
cat -T filename.txt - prints filename.txt with tabs displayed as ^I

Using grep and zgrep

grep is used to find patterns inside text files and print the lines that match the pattern, zgrep does the same in compressed text files, just as cat the output of grep can be redirected to the stdin of another Linux command for further processing.

Examples

grep -i "text" file.txt: searches for the string "text" ignoring lower/capital letters. 
grep -v "text" file.txt: searches for lines that do not contain the string "text". 
grep -r "text" directory: searches for the string "text" in all files in the directory and its subdirectories. 
grep -c "text" file.txt: counts the number of lines that contain the string "text".

Using cut

The cut command is used to print a number of columns coming from stdin

Examples

Assume we have file.txt with the following content and we want to print only columns 2 and 3

A B C 
1 2 3 
4 5 6 
7 8 9

We can combine cat and cut this way, -d defines space as delimiter and -f selects columns 2 and 3

$ cat file.txt | cut -d " " -f2,3 
B C 
2 3 
5 6 
8 9

Using tail

The tail command can be used to print from a line and after, to print the last N lines or to follow a streaming log file

Examples

The following example prints the last two columns of the file and discards the first line

$ cat file.txt  | cut -d " " -f2,3 | tail -n +2 
2 3 
5 6 
8 9

The following example prints the last two columns of the file and prints only the last line

$ cat file.txt  | cut -d " " -f2,3 | tail -n 1 
8 9

Using paste

The paste command can be used to merge lines to a single line using a delimiter, in this example we used paste to merge the lines of the file to a single line using the space as delimiter

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " 
A B C 1 2 3 4 5 6 7 8 9

Using sed

sed can be used for text manipulation, in the following example we use sed to delete all characters from A-Z range.

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g'  
   1 2 3 4 5 6 7 8 9

Using cut (again)

sed left 3 empty characters, there are many ways to deal with this but the easiest is to use the cut command and select all the characters after a specific character position. In this example we used the -c parameter and we instructed cut to select all text from the 4 position and after

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- 
1 2 3 4 5 6 7 8 9

Using tr

tr can be used to do simpler text substitutions than sed, in the following example we convert the row to a single column by substituting space with new-line.

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" 
1 
2 
3 
4 
5 
6 
7 
8 
9

Using shuf

shuf can be used to randomize input comming from stdin.

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf 
1 
9 
4 
7 
5 
2 
8 
6 
3

Using sort

sort can be used to do the opposite than shuf, to sort input

Example

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n 
1 
2 
3 
4 
5 
6 
7 
8 
9

Using wc

wc can be used to count number of characters and number of lines

Example: count number of characters

Dont get fooled, the number of characters are not nine but 19 because of the new line character that exists after each number

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n | wc -c 
18

Example: count number of lines

$ cat file.txt  | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n | wc -l 
9

I hope you enjoyed this short article about text analysis tools in Linux, its essential to know how to analyze log files and the techniques of the article will be super useful to get the job done!

Join Medium with my referral link - Konstantinos Patronas
As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…