Linux: The pee command, redirect stdin to multiple command pipelines

The Linux pee command will split your stdout to two or more streams which can be super handy in situations where stdin data processing in…

Linux: The pee command, redirect stdin to multiple command pipelines
Photo by Giorgio Trovato on Unsplash

The Linux pee command will split your stdout to two or more streams which can be super handy in situations where stdin data processing in many streams would bring a performance benefit.

How to install pee

pee is part of moreutils package, to install moreutils is very easy using the package manager of your OS.

centos / redhatsudo yum install moreutils

ubuntu / debiansudo apt-get install moreutils

Examples

pee and sed.

In this example we echo to the stdin of pee the string ‘1 2 3 4 5', the pee command creates two streams with this stream forwarded to the stdin of the sed commands which processed in parallel.echo "1 2 3 4 5" | pee "sed s/\ /+/g" "sed s/\ /*/g"
1+2+3+4+5
1*2*3*4*5

pee and mail.

In this example we use the output of the echo command as stdin for two mail commands that will take their input from pee, but their recipients list and subject differ, of course there are much better options on how to sent mails in parallel but still handy for simple scriptsecho "This is my email" | pee "mail -s 'this is a test1' user1@example.com" "mail -s 't
his is a test2' user2@example.com"

How process data in parallel with pee.

The following example generates a long column of numbers from 0 to 9.000.000, tr replaces new line with the “+” symbol and the sed statement removes the last character which would be always “+”, the output is a very long string with numbers “0+1+2+3+…+n+…+9.000.000”, and this output is calculated from the bc tool, a command line calculator, the time command measures how long it takes to calculate and we can see that is about 11 seconds in my computer, but it could be faster since bc does not uses all cores of my system, lets see how we can improve this$ time seq 9000000 | tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | bc
40500004500000real    0m11.849s
user    0m12.308s
sys     0m1.368s

In this example we split the numbers from 0 to 9.000.000 to two chunks, one is ranging from 0 to 4.500.000 and the second from 4.500.001 to 9.000.000time seq 9000000 | tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | pee "cut -d '+' -f1-4500000 | bc" "cut -d '+' -f4500001-9000000 | bc" | tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | bc

Breaking down the one-liner

This part creates a long string of numbers from 0 to 9.000.000 “+” separated and feed to stdin of peeseq 9000000 | tr -s "\n" "+" | sed 's/.\{1\}$/\n/'

Then the first part of the pee commands uses cut to get the elements of the string from 0 to 4.500.000, separated by the “+” symbol and forward them to the stdin of bc which does the calculation."cut -d '+' -f1-4500000 | bc"

The second part is similar, uses again the cut command to get the elements of the string from 4.500.001 to 9.000.000, separated by the “+” symbol and forward them to the stdin of bc which does the calculation."cut -d '+' -f4500001-9000000 | bc"

Running the parts of the one-liner we have explained so far we can see that it produces two very big numbers, the individual sums of the elements of the first and the second pee commandsseq 9000000 | tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | pee "cut -d '+' -f1-4500000 | bc" "cut -d '+' -f4500001-9000000 | bc"
10125002250000
30375002250000

To add those two numbers now is easy, we just need to convert the newlines to the “+” symbol and remove the last “+” symbol before sending them to bc.$ time seq 9000000 | tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | pee "cut -d '+' -f1-4500000 | bc" "cut -d '+' -f4500001-9000000 | bc"
| tr -s "\n" "+" | sed 's/.\{1\}$/\n/' | bc
40500004500000real    0m5.507s
user    0m8.819s
sys     0m2.169s

From performance perspective we can see that the total time to sum all numbers by breaking this to two streams it took just 5.5 seconds, a huge performance improvement.

I hope you found this article interesting :)