Linux: sum a column of numbers
A often problem i have to deal is how to sum columns of numeric data piped from other commands or files, lets see how we can deal with…
A common problem
In my work as a system administrator, quite a few times, I needed to sum numbers from the output of a command or a file, there are many ways to do this, and I will explain some of them.
paste and bc
Syntax:<cmd> | paste -s -d"+" | bc -l
Explaination:
cmd: the command that produces numerical output in columns.paste: a file formatting command, in this example -s prints the columnar output in a single line and -d sets a delimiter between the numbers, in this case, the plus sign.bc: is a calculator tool that will evaluate any input coming from the previous command, -l will also calculate float numbers.
Example:shuf -i 1-10 -n 10 | paste -s -d"+" | bc -l
shuf: this command produces numbers, in this case will generate 10 numbers between 1 and 10.shuf -i 1-10 -n 10
5
8
6
2
9
7
3
10
1
4- Then the output will piped to the
pastecommand and will print the numbers in a single line plus sign delimitedpaste -s -d"+"
5+8+6+2+9+7+3+10+1+4 - Then
bcevaluates the piped operationbc -l
55
The awk way
Syntax:<cmd> | awk '{ sum+=$1;print $1} END {print "Sum";print sum}'
Explaination:
cmd: The command that produces numbers.awk: The awk binary.$1: The first column.sum: The variable that we hold the sum of all numbers.END: When the input from cmd completes executes the commands in this block, prints “sum” and then the content of variable sum.
Example:shuf -i 1-100000 -n 10 | awk '{ sum+=$1;print $1} END {print "Sum";print sum}'
51609
52205
9991
56947
21047
18794
43381
33489
65426
73050
Sum
425939
Handling multicolumn output
If the command or the file contains multicolumn output, use awk, is match suited than bash commands
Example
We have file d.txt with the following contenta b c
61566 77924 14269
32516 75846 56557
24716 33232 53126
17449 54899 94879
93435 2724 45232
6342 1962 7194
17334 68481 56236
85183 49609 43453
40851 74123 69602
77608 70007 19688
And the task is to calculate the sum of each columncat d.txt | awk '{ sum1+=$1;sum2+=$2;sum3+=$3;print $1" "$2" "$3} END {print "Sum1 Sum2 Sum3";print sum1" "sum2" "sum3}'
Running this the output isa b c
61566 77924 14269
32516 75846 56557
24716 33232 53126
17449 54899 94879
93435 2724 45232
6342 1962 7194
17334 68481 56236
85183 49609 43453
40851 74123 69602
77608 70007 19688
Sum1 Sum2 Sum3
457000 508807 460236
Explaination of the awk statement{ sum1+=$1;sum2+=$2;sum3+=$3;print $1" "$2" "$3} END {print "Sum1 Sum2 Sum3";print sum1" "sum2" "sum3}'
- sum1 sum2 sum3 are variables that hold the sum
- $1 is the first column, $2 the second and $3 the third
- sum1+= $1 equals to sum1=sum1+$1
When the input of the file or the command is completed, the commands in the END block are executed and print the sum of each column.
I hope you found this article enjoyable and helped you be more productive when dealing with files and reports.