How to create a directory that will automatically zip files
When managing large file systems, automating routine tasks like file compression can help improve storage efficiency and reduce manual…
When managing large file systems, automating routine tasks like file compression can help improve storage efficiency and reduce manual overhead. One such automation can be achieved using inotify, a Linux utility that allows monitoring of filesystem events, and gzip, a powerful compression tool. This article will show you how to automate the compression of files in a directory using inotify and gzip, with various real-world cases, including limiting the number of parallel gzip processes.
Prerequisites
Before diving into the examples, ensure that the following tools are installed on your system:
inotify-tools: A command-line utility that provides a mechanism for monitoring file system events.gzip: A widely-used file compression utility.
You can install these tools on Debian-based systems using:
sudo apt-get install inotify-tools gzipFor other distributions, replace the package manager with the appropriate one (e.g., yum, dnf, etc.).
Understanding the inotify Command
The inotifywait command allows you to monitor a directory for file events. Specifically, it can track actions like file creation, deletion, modification, and more. When combined with gzip, we can automatically compress new files as they are created.
Case 1: Compress Files from /src to /dst
The simplest automation scenario involves watching a source directory (/src) and compressing new files directly into a different destination directory (/dst).
The Script
inotifywait -m -e create --format '%f' /src | while read FILE; do
[[ "$FILE" != *.gz ]] && gzip -c "/src/$FILE" > "/dst/${FILE}.gz" && rm "/src/$FILE"
doneBreakdown:
inotifywait -m -e create --format '%f' /src: This command monitors/srcfor newly created files (-e createevent) and outputs the file names.gzip -c "/src/$FILE" > "/dst/${FILE}.gz": This compresses the file from/srcand writes the compressed.gzversion to/dst.rm "/src/$FILE": Once the file is compressed and moved, the original uncompressed file in/srcis deleted to save space.
Use Case:
- This setup is useful when you want to keep the original source directory (
/src) clean while storing compressed files in a separate directory (/dst).
Case 2: Compress Files In-Place (from /src to /src)
Sometimes, you may want to compress files in the same directory where they are created. In this case, the .gz files will be stored in the same /src directory, and any files that are already compressed will be ignored.
The Script
inotifywait -m -e create --format '%f' /src | while read FILE; do
[[ "$FILE" != *.gz ]] && gzip "/src/$FILE"
doneBreakdown:
[[ "$FILE" != *.gz ]]: This ensures that files ending with.gzare ignored, preventing the compression of files that are already compressed.gzip "/src/$FILE": Compresses the new file in the same directory.
Use Case:
- This scenario is ideal when files are created in
/src, and you want to compress them without moving them to another location. It helps save disk space within the same folder.
Case 3: Limit the Number of Parallel gzip Processes
In cases where large numbers of files are created simultaneously, running too many gzip processes in parallel can oversaturate system resources, leading to performance degradation. To handle this, we can limit the number of concurrent gzip processes to, say, 8.
The Script
inotifywait -m -e create --format '%f' /src | while read FILE; do
[[ "$FILE" != *.gz ]] && gzip "/src/$FILE" &
while (( $(jobs | wc -l) >= 8 )); do wait -n; done
doneBreakdown:
- Background Process (
&): Thegzipcommand runs in the background for each file, allowing multiple files to be processed in parallel. - Job Control (
jobs | wc -l): This counts the number of background jobs (activegzipprocesses). - Limiting Parallel Jobs (
while (( $(jobs | wc -l) >= 8 ))): This loop ensures that no more than 8gzipprocesses run at the same time. If 8 or more are running, the script waits for at least one to finish (wait -n), before starting a new process.
Use Case:
- This setup is useful when handling directories with a high volume of new file creation. By limiting the number of parallel
gzipprocesses, you can avoid overloading the CPU and memory, maintaining system stability while ensuring compression tasks are still completed in a timely manner.
Additional Considerations
1. File Types to Ignore
In some cases, you may want to ignore specific file types other than .gz files. For example, to exclude image files, modify the script like this:
[[ "$FILE" != *.gz && "$FILE" != *.jpg && "$FILE" != *.png ]]2. Recursive Directory Monitoring
If you need to watch all subdirectories of /src as well, you can add the -r flag to inotifywait:
inotifywait -m -r -e create --format '%w%f' /src | while read FILE; do
[[ "$FILE" != *.gz ]] && gzip "$FILE"
done3. Compression Level
You can adjust the compression level of gzip using the -N flag, where N is a value from 1 (fast, less compression) to 9 (slower, but more compressed):
gzip -9 "/src/$FILE"Conclusion
By combining inotify and gzip, you can create a powerful automation system for file compression, saving space and reducing manual workload. Whether you need to compress files to a separate directory or handle compression in-place, these scripts offer flexibility for a wide range of use cases. The addition of parallel process limitations ensures your system won't be overwhelmed by excessive file creation and compression tasks.
With these scripts in hand, you can now handle various file compression scenarios efficiently on your system.