Summarize File Sizes by Directory and File Name Pattern in Linux
Analyzing file sizes in Linux can be challenging, especially when you want to group them by directory and filter them using a specific…
Analyzing file sizes in Linux can be challenging, especially when you want to group them by directory and filter them using a specific pattern. This article introduces a Bash function, summarize_sizes, that calculates the total size of files matching a given name pattern, grouped by directory, and outputs the results in a human-readable format.
The Problem
You may encounter scenarios where you need to:
- Locate all files matching a specific pattern (e.g.,
*.log). - Compute the total size of these files for each directory.
- Display the results in a format like KB, MB, or GB for better readability.
Doing this manually by combining commands like find, du, and awk can be tedious. That’s where the summarize_sizes function proves useful.
The Solution: summarize_log_sizes Function
The summarize_sizes Bash function simplifies the process. Here’s the full code:
summarize_sizes() {
local start_location=$1
local pattern=$2
find "$start_location" -type f -name "$pattern" -exec du -b {} + | \
awk '{print $1, $2}' | \
while read -r size filepath; do echo "$(dirname "$filepath") $size"; done | \
awk '{
dir[$1] += $2
} END {
for (d in dir) {
size = dir[d];
unit = "B";
if (size >= 1024) { size /= 1024; unit = "KB"; }
if (size >= 1024) { size /= 1024; unit = "MB"; }
if (size >= 1024) { size /= 1024; unit = "GB"; }
printf "%s %.2f %s\n", d, size, unit;
}
}' | sort
}How It Works
find: Searches for files matching the given pattern starting from the specified directory.du -b: Calculates the size of each file in bytes.awk: Groups the files by directory and aggregates their sizes.- Size Conversion: Sizes are automatically converted into KB, MB, or GB for better readability.
sort: Sorts the output by directory for easy interpretation.
How to Use the Function
- Copy the function code into your terminal or add it to your
~/.bashrcfile to make it persistent. - Use the function with the desired starting location and file name pattern:
summarize_sizes /path/to/start "*.log"Example
Input:
Suppose you have the following directory structure:
./z/log/stats.7.log 1048506
./z/log/stats.8.log 1048566
./z/log/stats.9.log 1048536
./z/log/stats.log 188416
./a/logs/021121.log 130
./a/logs/022421.log 130
./a/logs/030121.log 130Command:
summarize_sizes ./ "*.log"Output:
./z/log 3.18 MB
./a/logs 0.38 KBBenefits
- Simplifies analysis of file sizes by directory and pattern.
- Supports human-readable size formats.
- Easily customizable for different patterns and starting directories.
Conclusion
The summarize_sizes function is a versatile tool for summarizing file sizes by directory and file name pattern. Whether you're managing logs, large datasets, or backups, this function streamlines the process and saves time. Try it out and enhance your Linux toolkit!