Linux: text parsing tools and techniques
Knowing how to parse text files in Linux is essential in order to troubleshoot and analyze log files, in this example i will show you…
Knowing how to parse text files in Linux is essential in order to troubleshoot and analyze log files, in this example i will show you simple examples that will help you understand how to use these tools, note that i dont cover all tools neither whole functionality of the tools, just basic usage to help you start with!
Using cat and zcat
cat is used to print content of one or more files to the screen, zcat does the same but for compressed text files, its an essential command for log parsing, its output can be passed to the stdin of another command using the | symbol as we will see in other commands.
Examples
cat filename.txt - prints content of filename.txt
cat file1.txt file2.txt - prints content of both files
cat -n filename.txt - prints filename.txt with line numbers
cat -b filename.txt - prints filename.txt with blank lines numbered
cat -s filename.txt - prints filename.txt with multiple blank lines condensed into one
cat -E filename.txt - prints filename.txt with a $ at the end of each line
cat -T filename.txt - prints filename.txt with tabs displayed as ^IUsing grep and zgrep
grep is used to find patterns inside text files and print the lines that match the pattern, zgrep does the same in compressed text files, just as cat the output of grep can be redirected to the stdin of another Linux command for further processing.
Examples
grep -i "text" file.txt: searches for the string "text" ignoring lower/capital letters.
grep -v "text" file.txt: searches for lines that do not contain the string "text".
grep -r "text" directory: searches for the string "text" in all files in the directory and its subdirectories.
grep -c "text" file.txt: counts the number of lines that contain the string "text".Using cut
The cut command is used to print a number of columns coming from stdin
Examples
Assume we have file.txt with the following content and we want to print only columns 2 and 3
A B C
1 2 3
4 5 6
7 8 9We can combine cat and cut this way, -d defines space as delimiter and -f selects columns 2 and 3
$ cat file.txt | cut -d " " -f2,3
B C
2 3
5 6
8 9Using tail
The tail command can be used to print from a line and after, to print the last N lines or to follow a streaming log file
Examples
The following example prints the last two columns of the file and discards the first line
$ cat file.txt | cut -d " " -f2,3 | tail -n +2
2 3
5 6
8 9The following example prints the last two columns of the file and prints only the last line
$ cat file.txt | cut -d " " -f2,3 | tail -n 1
8 9Using paste
The paste command can be used to merge lines to a single line using a delimiter, in this example we used paste to merge the lines of the file to a single line using the space as delimiter
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " "
A B C 1 2 3 4 5 6 7 8 9Using sed
sed can be used for text manipulation, in the following example we use sed to delete all characters from A-Z range.
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g'
1 2 3 4 5 6 7 8 9Using cut (again)
sed left 3 empty characters, there are many ways to deal with this but the easiest is to use the cut command and select all the characters after a specific character position. In this example we used the -c parameter and we instructed cut to select all text from the 4 position and after
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4-
1 2 3 4 5 6 7 8 9Using tr
tr can be used to do simpler text substitutions than sed, in the following example we convert the row to a single column by substituting space with new-line.
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n"
1
2
3
4
5
6
7
8
9Using shuf
shuf can be used to randomize input comming from stdin.
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf
1
9
4
7
5
2
8
6
3Using sort
sort can be used to do the opposite than shuf, to sort input
Example
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n
1
2
3
4
5
6
7
8
9Using wc
wc can be used to count number of characters and number of lines
Example: count number of characters
Dont get fooled, the number of characters are not nine but 19 because of the new line character that exists after each number
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n | wc -c
18Example: count number of lines
$ cat file.txt | cut -d " " -f1,2,3 | paste -sd " " | sed 's/[A-Z]//g' | cut -c4- | shuf | tr -s " " "\n" | shuf | sort -n | wc -l
9I hope you enjoyed this short article about text analysis tools in Linux, its essential to know how to analyze log files and the techniques of the article will be super useful to get the job done!