How to create a directory that will automatically zip files

When managing large file systems, automating routine tasks like file compression can help improve storage efficiency and reduce manual…

How to create a directory that will automatically zip files
Photo by Anna Evans on Unsplash

When managing large file systems, automating routine tasks like file compression can help improve storage efficiency and reduce manual overhead. One such automation can be achieved using inotify, a Linux utility that allows monitoring of filesystem events, and gzip, a powerful compression tool. This article will show you how to automate the compression of files in a directory using inotify and gzip, with various real-world cases, including limiting the number of parallel gzip processes.

Prerequisites

Before diving into the examples, ensure that the following tools are installed on your system:

  • inotify-tools: A command-line utility that provides a mechanism for monitoring file system events.
  • gzip: A widely-used file compression utility.

You can install these tools on Debian-based systems using:

sudo apt-get install inotify-tools gzip

For other distributions, replace the package manager with the appropriate one (e.g., yum, dnf, etc.).

Understanding the inotify Command

The inotifywait command allows you to monitor a directory for file events. Specifically, it can track actions like file creation, deletion, modification, and more. When combined with gzip, we can automatically compress new files as they are created.

Case 1: Compress Files from /src to /dst

The simplest automation scenario involves watching a source directory (/src) and compressing new files directly into a different destination directory (/dst).

The Script

inotifywait -m -e create --format '%f' /src | while read FILE; do 
    [[ "$FILE" != *.gz ]] && gzip -c "/src/$FILE" > "/dst/${FILE}.gz" && rm "/src/$FILE" 
done

Breakdown:

  1. inotifywait -m -e create --format '%f' /src: This command monitors /src for newly created files (-e create event) and outputs the file names.
  2. gzip -c "/src/$FILE" > "/dst/${FILE}.gz": This compresses the file from /src and writes the compressed .gz version to /dst.
  3. rm "/src/$FILE": Once the file is compressed and moved, the original uncompressed file in /src is deleted to save space.

Use Case:

  • This setup is useful when you want to keep the original source directory (/src) clean while storing compressed files in a separate directory (/dst).

Case 2: Compress Files In-Place (from /src to /src)

Sometimes, you may want to compress files in the same directory where they are created. In this case, the .gz files will be stored in the same /src directory, and any files that are already compressed will be ignored.

The Script

inotifywait -m -e create --format '%f' /src | while read FILE; do 
    [[ "$FILE" != *.gz ]] && gzip "/src/$FILE" 
done

Breakdown:

  1. [[ "$FILE" != *.gz ]]: This ensures that files ending with .gz are ignored, preventing the compression of files that are already compressed.
  2. gzip "/src/$FILE": Compresses the new file in the same directory.

Use Case:

  • This scenario is ideal when files are created in /src, and you want to compress them without moving them to another location. It helps save disk space within the same folder.

Case 3: Limit the Number of Parallel gzip Processes

In cases where large numbers of files are created simultaneously, running too many gzip processes in parallel can oversaturate system resources, leading to performance degradation. To handle this, we can limit the number of concurrent gzip processes to, say, 8.

The Script

inotifywait -m -e create --format '%f' /src | while read FILE; do 
    [[ "$FILE" != *.gz ]] && gzip "/src/$FILE" & 
    while (( $(jobs | wc -l) >= 8 )); do wait -n; done 
done

Breakdown:

  1. Background Process (&): The gzip command runs in the background for each file, allowing multiple files to be processed in parallel.
  2. Job Control (jobs | wc -l): This counts the number of background jobs (active gzip processes).
  3. Limiting Parallel Jobs (while (( $(jobs | wc -l) >= 8 ))): This loop ensures that no more than 8 gzip processes run at the same time. If 8 or more are running, the script waits for at least one to finish (wait -n), before starting a new process.

Use Case:

  • This setup is useful when handling directories with a high volume of new file creation. By limiting the number of parallel gzip processes, you can avoid overloading the CPU and memory, maintaining system stability while ensuring compression tasks are still completed in a timely manner.

Additional Considerations

1. File Types to Ignore

In some cases, you may want to ignore specific file types other than .gz files. For example, to exclude image files, modify the script like this:

[[ "$FILE" != *.gz && "$FILE" != *.jpg && "$FILE" != *.png ]]

2. Recursive Directory Monitoring

If you need to watch all subdirectories of /src as well, you can add the -r flag to inotifywait:

inotifywait -m -r -e create --format '%w%f' /src | while read FILE; do 
    [[ "$FILE" != *.gz ]] && gzip "$FILE" 
done

3. Compression Level

You can adjust the compression level of gzip using the -N flag, where N is a value from 1 (fast, less compression) to 9 (slower, but more compressed):

gzip -9 "/src/$FILE"

Conclusion

By combining inotify and gzip, you can create a powerful automation system for file compression, saving space and reducing manual workload. Whether you need to compress files to a separate directory or handle compression in-place, these scripts offer flexibility for a wide range of use cases. The addition of parallel process limitations ensures your system won't be overwhelmed by excessive file creation and compression tasks.

With these scripts in hand, you can now handle various file compression scenarios efficiently on your system.