Bash — parallel command execution
There are three ways to execute commands in parallel using bash
There are three ways to execute commands in parallel using bash
- plain bash: no external dependencies to be installed
- parallel: a very smart tool to execute commands in parallel
- xargs: a swiss knife tool that every Linux user must know
plain bash
Save the following as configuration.cfg/var,*.log,log_res.txt
/var,*.gz,gzip_res.txt
Save the following as bash_plain_parallel.sh#!/bin/bash
for n in $(cat ./configuration.cfg)
do
DIRECTORY=$(echo $n | cut -d "," -f 1)
FILES=$(echo $n | cut -d "," -f 2)
RESULTS=$(echo $n | cut -d "," -f 3)
CMD="find "${DIRECTORY}" -name "${FILES}" > "${RESULTS}
eval ${CMD} &
done
wait
echo "Script execution completed"
make the file executable$ chmod +x bash_plain_parallel.sh
Explaining the script
The script reads each file of configuration.cfg, each line is stored in $n variablefor n in $(cat ./configuration.cfg)
Then each $n configuration line is spitted with “,” as delimiterDIRECTORY=$(echo $n | cut -d "," -f 1)
FILES=$(echo $n | cut -d "," -f 2)
RESULTS=$(echo $n | cut -d "," -f 3)
Then we build the command to be executed by concatenating text and the parameters we parsed from the configuration fileCMD="find "${DIRECTORY}" -name "${FILES}" > "${RESULTS}
We execute the command using eval, the & in the end of the line instructs bash to execute the command in the backgroundeval ${CMD} &
Now we wait for all executed processes to be completed using the wait command, what it does is actually waits for all PIDs started from the current bash shell to terminate, its a good practice to use wait because you can print a message when the background commands execution complete.wait
parallel
Parallel does what the plain bash solution does but with more tweak options, by default parallel is not included and needs to be installed
Create the following file and save it as parallel_config.cfg, each line is a command that will be executedfind /var -name *.gz > gz_files.txt
find /var -name *.log > log_files.txt
The commands can be executed with$ parallel -j 2 < parallel_config.cfg
- -j is the maximum number of parallel jobs
xargs
The true power of xargs comes from the ability to parallel tasks coming from another command$ find . -name '*.txt' | xargs -P10 -I {} grep -i 'something' {}
xargs get all file names from the find command, then executes a maximum of 10 Commands at the time where {} is being substituted from one of the filenames inputted to stdin