By Konstantinos Patronas — 27 Jul 2023

How to display only new results that did not exist in the previous grep run

I had one script which did a very basic job, grepping for the keyword ‘ERROR’ and printing the results to the console. It worked but i…

Photo by Yaoqi on Unsplash

I had one script which did a very basic job, grepping for the keyword ‘ERROR’ and printing the results to the console. It worked but i realized one flaw of this simple script! it would print over and over the same found text, something that would not do it ideally for something more advanced like creating email alerts because it would flood the email recipients with the same message, there is a tricky solution to this! lets see how.

Create the following file, this file will be the file that we want to grep for new matches only. Save it as test_text.txt

10:01 a 
10:02 b 
10:03 d

Now lets grep the file for the “b” character, works as expected, prints any lines that contain the “b” character

$ grep b test_text.txt 
10:02 b

Lets add another line now with the “b” character in

10:01 a 
10:02 b 
10:03 d 
10:04 b

If we try to grep again for the character “b” correctly grep will return two lines

$ grep b test_text.txt 
10:02 b 
10:04 b

Now lets tweak things a bit with this one-liner, running this seems that does not do anything since prints the two matched

$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null 
10:02 b 
10:04 b

But re-running the same command again we can see that nothing is printed!

$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null 
$

Lets now try to add two new lines to test_text.txt

10:01 a 
10:02 b 
10:03 d 
10:04 b 
10:05 b 
10:06 b

And now re-run the command we can see that prints only the newly added matched results!

$ grep b test_text.txt | grep -vf /tmp/ignore_grep | tee -a /tmp/ignore_grep ; sort --unique /tmp/ignore_grep -o /tmp/ignore_grep 2>/dev/null 
10:05 b 
10:06 b

How it works

The tee command does two things, append to file /tmp/ignore_grep all matches and at the same time prints to the stdout. grep -vf compares its stdin input with the lines in /tmp/ignore_grep and prints only lines that do not match, this way on every next run grep will print only results that do not exist in the previous run. sort — unique /tmp/ignore_grep -o /tmp/ignore_grep The sort command what it does is to sort and remove non unique lines in place in order to save disk space and enhance performance for next runs

Disadvantages

This trick comes with three major disadvantages

Performance is very slow as the file with the previous results is getting bigger and bigger
If the newly added lines are exact the same as a previous run line it will not show it, this trick works better with lines that can match a pattern but the same time have something distinct like a timestamp or a request id, so use it safely!
If you use the same file in /tmp for different input files to grep might cause unreliable results and will mess things up

Conclusion

In this article we saw how we can implement a simple mechanism in order to enhance grep in order to print only newly added matches, its not a very sophisticated method but works well in many cases despite its disadvantages.