Learn Go by creating a tool — Part 2

In the previous part https://medium.com/@kpatronas/learn-go-by-creating-a-tool-part-1-4ce3f9258027 of the series we saw how we can create…

Learn Go by creating a tool — Part 2
Photo by Timo Volz on Unsplash

In the previous part https://medium.com/@kpatronas/learn-go-by-creating-a-tool-part-1-4ce3f9258027 of the series we saw how we can create a simple command line tool with Go that reads a file and calculates its MD5 checksum, in this article we will do some enhancements to our script, the enhancements will be the following

  • Read input from stdin instead of filename
  • introduce the buffer parameter which allows to read and calculate the MD5 checksum by chunks of specific size instead of loading the whole file/stream in in memory

The full script with the enhancements is the following

package main 
 
import ( 
 "crypto/md5" 
 "flag" 
 "fmt" 
 "io" 
 "os" 
) 
 
func main() { 
 filename := flag.String("filename", "", "the name of the file to calculate the hash for") 
 buffer_size := flag.Int("buffer", 64, "the size of the buffer to use for reading the file in KB") 
 flag.Parse() 
 
 var input io.Reader 
 if len(*filename) == 0 { 
  input = os.Stdin 
 } else { 
  file, err := os.Open(*filename) 
  if err != nil { 
   fmt.Println("Error opening file:", err) 
   os.Exit(1) 
  } 
  defer file.Close() 
  input = file 
 } 
 
 hasher := md5.New() 
 bufferBytes := make([]byte, 1024*(*buffer_size)) 
 if _, err := io.CopyBuffer(hasher, input, bufferBytes); err != nil { 
  fmt.Println("Error calculating hash:", err) 
  os.Exit(1) 
 } 
 hash := hasher.Sum(nil) 
 if len(*filename) == 0 { 
  fmt.Printf("MD5 hash of standard input: %x\n", hash) 
 } else { 
  fmt.Printf("MD5 hash of file %s: %x\n", *filename, hash) 
 } 
}

Lets break down the enhancements and understand how they work

var input io.Reader 
 if len(*filename) == 0 { 
  input = os.Stdin 
 } else { 
  file, err := os.Open(*filename) 
  if err != nil { 
   fmt.Println("Error opening file:", err) 
   os.Exit(1) 
  } 
  defer file.Close() 
  input = file 
 }

This code block has some changes

  • first we declare a variable named input of interface io.Reader , both os.Stdin and os.Open implement io.Reader
  • next we check if filename has been given using the len(*filename) , if size is zero, which means that no filename is given assign input to read from os.Stdin (the terminal or pipe)
  • in the else section (if filename length is greater than zero) assign the file to input
  • using this if statement we either assign the filename or the stdin input to variable input which makes to the rest of the program the hash calculation transparent.

Next is this part of code

bufferBytes := make([]byte, 1024*(*buffer_size)) 
 if _, err := io.CopyBuffer(hasher, input, bufferBytes); err != nil { 
  fmt.Println("Error calculating hash:", err) 
  os.Exit(1) 
 }
  • The bufferBytes is a byte slice (buffer) used to control of how much data is read at a time when processing data from the input file or stdin, its equal to 1024 * buffer_size , for example 1024 * 64 is 65536 or 64KB.
  • Size matters, using a very small or big value of buffer_size will affect of how our tool performs, if the value is very small (few KBs) the program will use a very small amount of memory but might sacrifice performance because will need many small reads from the file, in case we use a very big value like 100MB there is the danger of memory pressure especially in a machine doing other things at the same time. The selection of the buffer must be done by correlation offile / stream size and available resources

Finally we have some changes here

if len(*filename) == 0 { 
  fmt.Printf("MD5 hash of standard input: %x\n", hash) 
 } else { 
  fmt.Printf("MD5 hash of file %s: %x\n", *filename, hash) 
 }

changes here are informative, if filename is not given print only the md5 checksum of stdin, else print the checksum plus the filename

Conclusion

in this part of the series we did some enhancements on our program like stdin read and buffer size control, in next articles we will see how we can other enhancements like parallel processing of files and adaptive buffer size