Linux Storage Testing: Generate Random-Sized Files in Parallel with This Bash Script

Ever needed to generate a batch of random-sized files for testing or development purposes? Whether you’re stress-testing file systems…

Linux Storage Testing: Generate Random-Sized Files in Parallel with This Bash Script
Photo by Museums Victoria on Unsplash

Ever needed to generate a batch of random-sized files for testing or development purposes? Whether you’re stress-testing file systems, simulating large-scale file storage, or experimenting with file uploads, this Bash script can be your go-to solution. In this article, we’ll walk you through a script that creates files with random sizes within a defined range, lets you customize the naming pattern for each file, and now supports parallel file creation for improved performance.

The Script

Here’s the complete Bash script for creating random-sized files:

#!/bin/bash 
 
# Function to display usage 
usage() { 
    echo "Usage: $0 <number_of_files> <min_size_in_bytes> <max_size_in_bytes> <file_name_pattern> [parallel_processes]" 
    exit 1 
} 
 
# Check if the correct number of arguments is provided 
if [ "$#" -lt 4 ] || [ "$#" -gt 5 ]; then 
    usage 
fi 
 
# Read input arguments 
number_of_files=$1 
min_size=$2 
max_size=$3 
file_name_pattern=$4 
parallel_processes=${5:-1} # Default to 1 if not provided 
 
# Validate inputs 
if ! [[ "$number_of_files" =~ ^[0-9]+$ ]] || ! [[ "$min_size" =~ ^[0-9]+$ ]] || ! [[ "$max_size" =~ ^[0-9]+$ ]] || ! [[ "$parallel_processes" =~ ^[0-9]+$ ]]; then 
    echo "Error: All inputs must be positive integers." 
    usage 
fi 
 
if [ "$min_size" -gt "$max_size" ]; then 
    echo "Error: Minimum size cannot be greater than maximum size." 
    usage 
fi 
 
# Adjust RANDOM seed for better size distribution 
get_random_size() { 
    local range=$((max_size - min_size + 1)) 
    echo $((RANDOM * RANDOM % range + min_size)) 
} 
 
# Function to create a single file 
create_file() { 
    local index=$1 
    local size=$(get_random_size) 
    local filename="${file_name_pattern}_$index" 
    head -c "$size" /dev/urandom > "$filename" 
    echo "Created $filename with size $size bytes" 
} 
 
# Export function and variables for parallel execution 
export -f create_file 
export -f get_random_size 
export min_size max_size file_name_pattern 
 
# Create files in parallel 
seq 1 "$number_of_files" | xargs -n 1 -P "$parallel_processes" -I {} bash -c 'create_file "$@"' _ {}

How it works

Parameters Explained

The script accepts four input parameters:
1. number_of_files: Number of files you want to generate.
2. min_size: Minimum size of each file in bytes.
3. max_size: Maximum size of each file in bytes.
4. file_name_pattern: The base name for your files; a number will be appended to make each name unique.

5. parallel_processes (optional): Number of parallel processes to use for file creation. Defaults to 1 if not specified.

Example Usage

Running the script with:

chmod +x ./create_files.sh 
./create_files.sh 10 100 1000 testfile 4

Generates 10 files named testfile_1, testfile_2, …, testfile_10 with sizes ranging from 100 to 1000 bytes, using 4 parallel processes.

Key Features

  • Random Size Generation: The $RANDOM variable is used to calculate a size within the specified range.
  • Binary Content: Files are populated with random binary data using /dev/urandom.
  • Customizable Names: A naming pattern ensures each file has a unique, user-defined name.
  • Parallel Processing: The script uses xargs for parallel execution, significantly speeding up file creation.

Real-World Applications

Here’s how you can use this script:
1. File System Testing: Stress-test file systems or storage services with varying file sizes.
2. Upload Simulations: Simulate user-uploaded files of different sizes to test server performance.
3. Backup Strategies: Verify backup and restore processes by generating diverse file sets.
4. Data Handling Experiments: Test scripts and applications that process file sizes dynamically.

5. Parallel Performance Testing**: Evaluate how systems handle multiple simultaneous file operations.

Output Example

If you run the command:

./create_files.sh 5 500 1500 samplefile 2

You’ll see an output like this:

Created samplefile_1 with size 1264 bytes 
Created samplefile_2 with size 832 bytes 
Created samplefile_3 with size 1492 bytes 
Created samplefile_4 with size 754 bytes 
Created samplefile_5 with size 1386 byte

Conclusion

Whether you’re a developer, a systems engineer, or someone looking to experiment with random file generation, this script is a versatile tool to add to your toolkit. With the new parallel processing option and refined random size generation, it’s faster and more reliable than ever. Modify it as needed and adapt it to your workflows. Happy scripting!