How to display only new results that did not exist in the previous grep run
I had one script which did a very basic job, grepping for the keyword ‘ERROR’ and printing the results to the console. It worked but i…
I had one script which did a very basic job, grepping for the keyword ‘ERROR’ and printing the results to the console. It worked but i realized one flaw of this simple script! it would print over and over the same found text, something that would not do it ideally for something more advanced like creating email alerts because it would flood the email recipients with the same message, there is a tricky solution to this! lets see how.
Create the following file, this file will be the file that we want to grep for new matches only. Save it as test_text.txt
10:01 a
10:02 b
10:03 dNow lets grep the file for the “b” character, works as expected, prints any lines that contain the “b” character
$ grep b test_text.txt
10:02 bLets add another line now with the “b” character in
10:01 a
10:02 b
10:03 d
10:04 bIf we try to grep again for the character “b” correctly grep will return two lines
$ grep b test_text.txt
10:02 b
10:04 bNow lets tweak things a bit with this one-liner, running this seems that does not do anything since prints the two matched
$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null
10:02 b
10:04 bBut re-running the same command again we can see that nothing is printed!
$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null
$Lets now try to add two new lines to test_text.txt
10:01 a
10:02 b
10:03 d
10:04 b
10:05 b
10:06 bAnd now re-run the command we can see that prints only the newly added matched results!
$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null
10:05 b
10:06 bHow it works
The tee command does two things, append to file /tmp/ignore_grep all matches and at the same time prints to the stdout. grep -vf compares its stdin input with the lines in /tmp/ignore_grep and prints only lines that do not match, this way on every next run grep will print only results that do not exist in the previous run. sort — unique /tmp/ignore_grep -o /tmp/ignore_grep The sort command what it does is to sort and remove non unique lines in place in order to save disk space and enhance performance for next runs
Disadvantages
This trick comes with three major disadvantages
- Performance is very slow as the file with the previous results is getting bigger and bigger
- If the newly added lines are exact the same as a previous run line it will not show it, this trick works better with lines that can match a pattern but the same time have something distinct like a timestamp or a request id, so use it safely!
- If you use the same file in
/tmpfor different input files to grep might cause unreliable results and will mess things up
Conclusion
In this article we saw how we can implement a simple mechanism in order to enhance grep in order to print only newly added matches, its not a very sophisticated method but works well in many cases despite its disadvantages.