Learn Go by creating a tool — Part 2
In the previous part https://medium.com/@kpatronas/learn-go-by-creating-a-tool-part-1-4ce3f9258027 of the series we saw how we can create…
In the previous part https://medium.com/@kpatronas/learn-go-by-creating-a-tool-part-1-4ce3f9258027 of the series we saw how we can create a simple command line tool with Go that reads a file and calculates its MD5 checksum, in this article we will do some enhancements to our script, the enhancements will be the following
- Read input from stdin instead of filename
- introduce the buffer parameter which allows to read and calculate the MD5 checksum by chunks of specific size instead of loading the whole file/stream in in memory
The full script with the enhancements is the following
package main
import (
"crypto/md5"
"flag"
"fmt"
"io"
"os"
)
func main() {
filename := flag.String("filename", "", "the name of the file to calculate the hash for")
buffer_size := flag.Int("buffer", 64, "the size of the buffer to use for reading the file in KB")
flag.Parse()
var input io.Reader
if len(*filename) == 0 {
input = os.Stdin
} else {
file, err := os.Open(*filename)
if err != nil {
fmt.Println("Error opening file:", err)
os.Exit(1)
}
defer file.Close()
input = file
}
hasher := md5.New()
bufferBytes := make([]byte, 1024*(*buffer_size))
if _, err := io.CopyBuffer(hasher, input, bufferBytes); err != nil {
fmt.Println("Error calculating hash:", err)
os.Exit(1)
}
hash := hasher.Sum(nil)
if len(*filename) == 0 {
fmt.Printf("MD5 hash of standard input: %x\n", hash)
} else {
fmt.Printf("MD5 hash of file %s: %x\n", *filename, hash)
}
}Lets break down the enhancements and understand how they work
var input io.Reader
if len(*filename) == 0 {
input = os.Stdin
} else {
file, err := os.Open(*filename)
if err != nil {
fmt.Println("Error opening file:", err)
os.Exit(1)
}
defer file.Close()
input = file
}This code block has some changes
- first we declare a variable named
inputof interfaceio.Reader, bothos.Stdinandos.Openimplementio.Reader - next we check if filename has been given using the
len(*filename), if size is zero, which means that no filename is given assigninputto read fromos.Stdin(the terminal or pipe) - in the
elsesection (if filename length is greater than zero) assign thefileto input - using this if statement we either assign the filename or the stdin input to variable
inputwhich makes to the rest of the program the hash calculation transparent.
Next is this part of code
bufferBytes := make([]byte, 1024*(*buffer_size))
if _, err := io.CopyBuffer(hasher, input, bufferBytes); err != nil {
fmt.Println("Error calculating hash:", err)
os.Exit(1)
}- The
bufferBytesis a byte slice (buffer) used to control of how much data is read at a time when processing data from the input file or stdin, its equal to1024 * buffer_size, for example1024 * 64is65536or 64KB. - Size matters, using a very small or big value of buffer_size will affect of how our tool performs, if the value is very small (few KBs) the program will use a very small amount of memory but might sacrifice performance because will need many small reads from the file, in case we use a very big value like 100MB there is the danger of memory pressure especially in a machine doing other things at the same time. The selection of the buffer must be done by correlation offile / stream size and available resources
Finally we have some changes here
if len(*filename) == 0 {
fmt.Printf("MD5 hash of standard input: %x\n", hash)
} else {
fmt.Printf("MD5 hash of file %s: %x\n", *filename, hash)
}changes here are informative, if filename is not given print only the md5 checksum of stdin, else print the checksum plus the filename
Conclusion
in this part of the series we did some enhancements on our program like stdin read and buffer size control, in next articles we will see how we can other enhancements like parallel processing of files and adaptive buffer size