By Konstantinos Patronas — 21 Mar 2022

Linux: sum a column of numbers

A often problem i have to deal is how to sum columns of numeric data piped from other commands or files, lets see how we can deal with…

Photo by Tyler Easton on Unsplash

A common problem

In my work as a system administrator, quite a few times, I needed to sum numbers from the output of a command or a file, there are many ways to do this, and I will explain some of them.

paste and bc

Syntax:<cmd> | paste -s -d"+" | bc -l

Explaination:

cmd: the command that produces numerical output in columns.
paste: a file formatting command, in this example -s prints the columnar output in a single line and -d sets a delimiter between the numbers, in this case, the plus sign.
bc: is a calculator tool that will evaluate any input coming from the previous command, -l will also calculate float numbers.

Example:shuf -i 1-10 -n 10 | paste -s -d"+" | bc -l

shuf : this command produces numbers, in this case will generate 10 numbers between 1 and 10.shuf -i 1-10 -n 10
5
8
6
2
9
7
3
10
1
4
Then the output will piped to the paste command and will print the numbers in a single line plus sign delimitedpaste -s -d"+"
5+8+6+2+9+7+3+10+1+4
Then bc evaluates the piped operationbc -l
55

The awk way

Syntax:<cmd> | awk '{ sum+=$1;print $1} END {print "Sum";print sum}'

Explaination:

cmd: The command that produces numbers.
awk: The awk binary.
$1: The first column.
sum: The variable that we hold the sum of all numbers.
END: When the input from cmd completes executes the commands in this block, prints “sum” and then the content of variable sum.

Example:shuf -i 1-100000 -n 10 | awk '{ sum+=$1;print $1} END {print "Sum";print sum}'
51609
52205
9991
56947
21047
18794
43381
33489
65426
73050
Sum
425939

Handling multicolumn output

If the command or the file contains multicolumn output, use awk, is match suited than bash commands

Example

We have file d.txt with the following contenta b c
61566 77924 14269
32516 75846 56557
24716 33232 53126
17449 54899 94879
93435 2724 45232
6342 1962 7194
17334 68481 56236
85183 49609 43453
40851 74123 69602
77608 70007 19688

And the task is to calculate the sum of each columncat d.txt | awk '{ sum1+=$1;sum2+=$2;sum3+=$3;print $1" "$2" "$3} END {print "Sum1 Sum2 Sum3";print sum1" "sum2" "sum3}'

Running this the output isa b c
61566 77924 14269
32516 75846 56557
24716 33232 53126
17449 54899 94879
93435 2724 45232
6342 1962 7194
17334 68481 56236
85183 49609 43453
40851 74123 69602
77608 70007 19688
Sum1 Sum2 Sum3
457000 508807 460236

Explaination of the awk statement{ sum1+=$1;sum2+=$2;sum3+=$3;print $1" "$2" "$3} END {print "Sum1 Sum2 Sum3";print sum1" "sum2" "sum3}'

sum1 sum2 sum3 are variables that hold the sum
$1 is the first column, $2 the second and $3 the third
sum1+= $1 equals to sum1=sum1+$1

When the input of the file or the command is completed, the commands in the END block are executed and print the sum of each column.

I hope you found this article enjoyable and helped you be more productive when dealing with files and reports.