Migrate A Running Linux Process on Another Machine Using CRIU

Introduction

Migrate A Running Linux Process on Another Machine Using CRIU
Photo by Joe Caione on Unsplash

Introduction

Process migration — the ability to move a running process from one machine to another — is a powerful capability in Linux. It enables load balancing, minimizes downtime, and enhances fault tolerance. This guide demonstrates how to achieve process migration using CRIU (Checkpoint/Restore In Userspace) with a simple example program written in C.

We’ll walk through installing CRIU, creating a program that updates a counter every second, checkpointing the program’s state, and restoring it, either on the same or another machine. Special attention is given to handling terminal-based processes using the --shell-job option.

Installing CRIU on Ubuntu

CRIU is available via Ubuntu’s repositories, but the latest version can be accessed by adding its official PPA.

Step 1: Install add-apt-repository (if missing)

If add-apt-repository is not installed, install it using:

sudo apt update 
sudo apt install software-properties-common -y

Step 2: Add the CRIU PPA

Add the CRIU PPA to ensure you get the latest version:

sudo add-apt-repository ppa:criu/ppa

Step 3: Update the Package List

Refresh the list of available packages:

sudo apt update

Step 4: Install CRIU

Install CRIU using:

sudo apt install criu -y

Step 5: Verify the Installation

Check if CRIU is installed correctly by running:

criu --version

You should see the installed version, e.g., Version: 3.17.

Step 6: Test Compatibility

Verify that your system supports CRIU:

sudo criu check

If everything is compatible, you’ll see:

Looks good.

Creating the Example Program

We’ll create a simple program that updates a counter every second and prints the current time. This program serves as an ideal example for demonstrating CRIU’s checkpoint and restore features.

C Code: Counter Updating Every Second

Save the following code as demo.c :

#include <stdio.h> 
#include <unistd.h> 
#include <stdlib.h> 
#include <signal.h> 
#include <time.h> 
 
volatile int running = 1; // Flag to indicate if the program is running 
 
// Function to handle termination signals 
void handle_signal(int sig) { 
    printf("\nProcess received signal %d, exiting gracefully.\n", sig); 
    running = 0; // Stop the loop 
} 
 
int main() { 
    // Register signal handler 
    signal(SIGTERM, handle_signal); 
    signal(SIGINT, handle_signal); 
 
    printf("Process started with PID: %d\n", getpid()); 
    printf("Updating every second...\n"); 
 
    // Counter variable 
    unsigned long long counter = 0; 
 
    // Infinite loop to update the counter every second 
    while (running) { 
        time_t now = time(NULL); 
        printf("Counter: %llu | Time: %s", counter++, ctime(&now)); 
        fflush(stdout); // Ensure the output is immediately printed 
        sleep(1);       // Wait for 1 second 
    } 
 
    printf("Process stopped. Final counter value: %llu\n", counter); 
    return 0; 
}

Step 1: Compile and Run the Program

  1. Compile the program:
gcc demo.c -o demo

2. Run it:

./demo

You’ll see output similar to this:

Process started with PID: 252114 
Updating every second... 
Counter: 0 | Time: Tue Nov 19 18:50:10 2024 
Counter: 1 | Time: Tue Nov 19 18:50:11 2024 
Counter: 2 | Time: Tue Nov 19 18:50:12 2024 
Counter: 3 | Time: Tue Nov 19 18:50:13 2024 
Counter: 4 | Time: Tue Nov 19 18:50:14 2024 
Counter: 5 | Time: Tue Nov 19 18:50:15 2024

Step 2: Checkpoint the Process

Processes connected to a terminal (like our example) need the --shell-job option to handle terminal-related context. The following oneliner will create a directory named as the process PID mentioned in step 1, in your computer will be another number, and then criu will dump everything related to this PID in this directory, at end will kill the process.

mkdir -p 252114 && criu dump -t 252114 -D ./252114 --shell-job

Returning back to the terminal where the process was executing we can see that the counter at the time of the termination was 9

Counter: 6 | Time: Tue Nov 19 18:50:16 2024 
Counter: 7 | Time: Tue Nov 19 18:50:17 2024 
Counter: 8 | Time: Tue Nov 19 18:50:18 2024 
Counter: 9 | Time: Tue Nov 19 18:50:19 2024 
Killed

Step 3: Restore the Process

  1. Restore on the Same Machine: Use CRIU to restore the process:
criu restore -D ./252114/ --shell-job 
Counter: 10 | Time: Tue Nov 19 18:55:59 2024 
Counter: 11 | Time: Tue Nov 19 18:55:00 2024 
Counter: 12 | Time: Tue Nov 19 18:55:01 2024 
Counter: 13 | Time: Tue Nov 19 18:55:02 2024

As we can see the program continued its execution where it left, but note that the time is not in the same minute since its execution resumed some minutes after, i am not sure but this makes me think that resuming processes that are use time to their calculations probably will crash for various reasons or at worst they will produce wrong results, so use with caution.

2. Restore on a remote machine: Transfer the checkpoint files to a remote machine using scp or rsync, and use the above command, but you should be aware of the following

  • System Compatibility: Both machines must have compatible architectures, libraries, and kernel versions.
  • Open Resources: Processes with active network connections or file dependencies may require additional options (e.g., --tcp-established).
  • Terminal Dependencies: Processes interacting with terminals must use the --shell-job option.
  • Downtime: The checkpoint and restore processes introduce some downtime, especially for large memory states.

Conclusion

CRIU simplifies the task of checkpointing and restoring Linux processes, enabling powerful use cases like live migration, fault recovery, and system maintenance. In this guide, we demonstrated how to checkpoint and restore a simple C program, both on the same and on a remote machine.

While CRIU is robust, it requires caution when handling processes that depend on system time or external resources. With proper configuration and understanding of its options, CRIU can be a game-changer for developers and system administrators seeking flexible process management solutions.

Explore CRIU and experience the power of live process migration today!