Memory Disaggregation Concepts in Practice on Servers
Memory disaggregation is about how servers use and share memory. This will allow one machine to use memory from another instead of using its local RAM. This action is often called remote memory or far memory. This memory disaggregation server tutorial from PerLod Hosting will teach you how to deploy it using Linux servers with three practical methods, including:
- Method A: Use NVMe over TCP.
- Method B: Use NBD.
- Method C: Use RDMA-based remote RAM.
In simple words, memory disaggregation separates CPU from RAM, which means one computer can access and use memory that is physically installed on another computer through a high-speed network. It has better memory use, fewer crashes, and is best for right-sizing.
Now, let’s dive into the guide steps and learn how to deploy memory disaggregation in Linux servers.
Table of Contents
Memory Disaggregation Server Tutorial
To start the guide steps, you need two Linux servers running Ubuntu 24.04. It is recommended to use a 10, 25, or 40 GbE network connection; a 1 GbE connection can be used for testing purposes. You must ensure that you have root SSH access to both machines and that their system clocks are synchronized using chrony.
Note: If you plan to use jumbo frames, make sure the MTU is properly configured on both servers.
In this setup, we use one machine as a memory server (target) and the other one as a computer server (host).
Method A: Use NVMe over TCP as Remote Swap for Memory Disaggregation
The first method is to use NVMe over TCP as a remote swap, which is fast, stable, and supported by the Linux Kernel. This turns a remote NVMe device into extra swap space, which gives your server more space during memory pressure.
Set up Memory Server
First, you must set up the target machine or memory server. Install the required tools with the following command:
sudo apt update
sudo apt install nvmetcli nvme-cli -y
Then, you must create a backing block device. In this example, we create a file-backed block device, which means we are using a regular file instead of a physical device to simulate a storage device.
sudo mkdir -p /opt/nvmeof
sudo fallocate -l 200G /opt/nvmeof/backing.img
sudo losetup -fP /opt/nvmeof/backing.img
LOOP=$(losetup -j /opt/nvmeof/backing.img | awk -F: '{print $1}')
Now you can configure your NVMe over Fabrics (NVMe-oF) target to share the virtual block device over the network using TCP:
sudo nvmetcli restore <<'EOF'
{
"ports": {
"1": {
"addr": {
"trtype": "tcp",
"traddr": "0.0.0.0",
"trsvcid": "4420"
}
}
},
"subsystems": {
"nqn.2025-10.io.lab:nvme-swap": {
"nqn": "nqn.2025-10.io.lab:nvme-swap",
"namespaces": {
"1": {
"device": {
"path": "__LOOP__"
}
}
},
"allowed_hosts": []
}
},
"hosts": {}
}
EOF
Next, update the NVMe-oF configuration file by replacing a placeholder with the actual device name with the following command:
sudo sed -i "s|__LOOP__|$LOOP|g" /etc/nvme/nvmet/config.json
This ensures that the NVMe target knows which backing device to use.
Clear any existing NVMe target configuration currently loaded in the kernel with the command below:
sudo nvmetcli clear
Finally, load the new configuration and list the active NVMe target configuration so you can verify that the port and subsystem were set up correctly:
sudo nvmetcli restore /etc/nvme/nvmet/config.json
sudo nvmetcli ls
Note: If you want to make your configuration persistent, you need to keep /etc/nvme/nvmet/config.json and reapply on boot via systemd or nvmetcli restore.
Set up Host Server
The next step is to set up the host or compute server. First, install the NVMe tools with the following command:
sudo apt update
sudo apt install nvme-cli -y
Then, you must discover the target or memory server and connect to it. To do this, run the following commands:
sudo nvme discover -t tcp -a TARGET_IP -s 4420
sudo nvme connect -t tcp -a TARGET_IP -s 4420 -n nqn.2025-10.io.lab:nvme-swap
Be sure to match the name and port defined in the target’s configuration.
Verify that the NVMe device is connected to your device:
lsblk | grep nvme
If the connection was successful, you should see a device such as /dev/nvme0n1, which represents the remote NVMe storage from the target.
Next, you must prepare the device as swap space and activate it. To do this, you can use the following commands:
sudo mkswap /dev/nvme0n1
sudo swapon /dev/nvme0n1
To make the swap space persistent, use the command below:
echo '/dev/nvme0n1 none swap defaults,pri=5 0 0' | sudo tee -a /etc/fstab
The “pri=5” option sets the priority of this swap space. It is useful if you have multiple swap devices.
Verify the configuration with the commands below:
swapon --show
nvme list
Enable zswap Compression
It is recommended to add zswap. It is a compressed cache for swap pages. Instead of writing data directly to the swap device, it compresses it and keeps it in RAM as long as possible, which reduces disk I/O and improves performance.
First, enable zswap at runtime with the command below:
echo 1 | sudo tee /sys/module/zswap/parameters/enabled
Then, set the compression algorithm and memory limit with the commands below:
echo zstd | sudo tee /sys/module/zswap/parameters/compressor
echo 20 | sudo tee /sys/module/zswap/parameters/max_pool_percent
In the command, zstd is chosen as the compressor because it offers a good balance between compression ratio and speed. And max_pool_percent=20 limits zswap’s memory usage to 20% of system RAM, which helps prevent resource contention.
To make these settings persistent, you can add them to the kernel boot parameters using GRUB. To do this, run the commands below:
sudo sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="zswap.enabled=1 zswap.compressor=zstd zswap.max_pool_percent=20 /' /etc/default/grub
sudo update-grub
Test Performance and Behavior of NVMe Swap Device
At this point, you can confirm that zswap and the NVMe swap device are working correctly under memory pressure.
First, install the stress-ng tool, which is used to simulate high system load and memory usage, with the command below:
sudo apt install stress-ng -y
Then, generate memory pressure to observe how zswap behaves:
stress-ng --vm 4 --vm-bytes 85% -t 120s --metrics-brief
Next, you can use the following commands in another terminal to monitor swap and paging activity:
vmstat 1
cat /proc/swaps
dmesg | tail
Note: You can also test your application’s performance under memory pressure or run a storage benchmark like fio on /dev/nvme0n1 to verify network and NVMe throughput.
Method B: Use Network Block Device as Remote Swap (NBD) for Memory Disaggregation
Another method for a memory disaggregation server tutorial is to use NBD. It is simpler than NVMe over TCP and available everywhere. However, it is slower for heavy workloads but perfect for testing or small deployments.
Set up NBD Server (Memory Server)
The first step is to set up the memory server. From your target server, install NBD with the command below:
sudo apt update
sudo apt install nbd-server -y
Then, create the backing file with the following commands:
sudo mkdir -p /opt/nbd
sudo fallocate -l 100G /opt/nbd/swap.img
Next, you must configure your NBD server. To do this, run the command below:
sudo tee /etc/nbd-server/config >/dev/null <<'EOF'
[generic]
user = nbd
group = nbd
listenaddr = 0.0.0.0
[swap-export]
exportname = /opt/nbd/swap.img
port = 10809
EOF
Enable the NBD server on your target machine with the command below:
sudo systemctl enable --now nbd-server
Also, you can verify its listening port with:
ss -ltnp | grep 10809
Set up NBD Client (Compute Server)
From your host server, you must install the NBD client and connect to your NBD server (target machine). To do this, run the commands below:
sudo apt update
sudo apt install nbd-client -y
sudo modprobe nbd max_part=0
sudo nbd-client TARGET_IP 10809 /dev/nbd0
Then, use it as a swap with the commands below:
sudo mkswap /dev/nbd0
sudo swapon /dev/nbd0
echo '/dev/nbd0 none swap defaults,pri=5 0 0' | sudo tee -a /etc/fstab
To verify your configuration, you can use the command below:
swapon --show
Tip: You can also combine zsawp with NBD, the same as method A we explained.
Method C: True Memory Disaggregation with RDMA
There is another method for memory disaggregation server tutorial, which is RDMA remote RAM. If you want true memory disaggregation, not just remote storage, you can use one-sided RDMA with a remote-paging system. This allows a host system to use memory from a remote server as if it were local RAM. It is an advanced method and best for researching.
Two notable research projects provide this functionality:
1. Infiniswap:
- Provides transparent remote memory paging over RDMA.
- Paper: NSDI 2017. Code is available online.
- Requires RDMA NICs (or Soft-RoCE for lab setups), custom kernel modules, and careful tuning.
2. Fastswap:
- Provides access to far-memory via RDMA with kernel patches.
- Research code targeting specific kernel versions.
Warning: Both projects are experimental research software. They often require older or patched kernels, manual builds, and may be unstable.
RDMA Setup Using Soft-RoCE
If you don’t have hardware RDMA NICs, you can emulate RDMA using Soft-RoCE (RXE) over standard Ethernet. This works for testing or lab experiments.
From both servers (memory and host), install RDMA userland tools with the command below:
sudo apt update
sudo apt install rdma-core -y
Then, configure RXE over your Ethernet NIC, replace ens192 with your interface:
sudo apt install perftest -y
sudo rdma link add rxe0 type rxe netdev ens192
sudo rdma link
Notes:
- Ensure the RXE device (rxe0) is UP.
- Documentation may vary by distribution: RHEL uses rxe_cfg, newer Ubuntu tools use rdma.
Then, from memory and host servers, you must test RDMA connectivity. To do this, run the commands below:
ib_write_lat -d rxe0 -F & #on-target
ib_write_lat -d rxe0 -F TARGET_IP #on-host
This measures the RDMA latency and verifies that the path is functional.
Infiniswap Example Workflow
You must keep in mind that the exact steps depend on the repo and kernel version. Always check the README and open issues before proceeding.
High-level outline:
First, you must get the source code on both servers:
git clone https://github.com/SymbioticLab/Infiniswap.git
cd Infiniswap
Then, build the kernel modules and user tools following the repository instructions. You may need matching kernel headers and specific kernel configuration options.
Next, from the target server, you must start the memory server daemon. On the other hand, from the host server, load the client module, so the kernel swap subsystem pages memory out to remote RAM over RDMA instead of local disk.
Finally, enable swap devices using Infiniswap tools. Verify them and monitor stats through debugfs or sysfs paths as described in the README.
Operational Tuning and Verification for Memory Disaggregation
After configuring remote swap or remote memory, you can tune the system for performance and reliability.
Tips for NUMA-aware workload placement:
- For memory-intensive workloads, try to keep them on the CPU or server closest to local RAM.
- Treat remote swap or remote RAM as a temporary source, not the primary memory source.
- In Kubernetes, use soft memory limits, QoS settings, and node-affinity for RAM-hungry pods to avoid excessive remote memory usage.
Tips for System Tuning (sysctl):
Adjust kernel settings carefully to manage swap behavior:
echo 'vm.swappiness=20' | sudo tee /etc/sysctl.d/99-swap.conf
echo 'vm.page-cluster=0' | sudo tee -a /etc/sysctl.d/99-swap.conf
sudo sysctl --system
- vm.swappiness=20: Kernel prefers RAM over swap; swap is used only when necessary.
- vm.page-cluster=0: Reduces the number of pages swapped at once, improving latency for memory-heavy workloads.
Tips for measuring memory and swap usage:
Check baseline usage with the following commands:
free -h
cat /proc/swaps
vmstat 1
Generate memory pressure. For example:
stress-ng --vm 6 --vm-bytes 90% --vm-method all --verify -t 180s --metrics-brief
Monitor application latency, tail p99s, and system metrics.
Troubleshooting and Best Practices For Memory Disaggregation
When you use remote swap or disaggregated memory, you may face some common issues such as network congestion, packet loss, MTU mismatches, and target server failures. Here are the best practices to prevent these from happening:
Link congestion: Try to use zswap to reduce I/O pressure, lower swappiness, and prefer NVMe/TCP over NBD for heavy loads.
Packet loss or MTU mismatch: Verify the interface MTU settings, and disable TSO/GRO if issues arise.
Target server down: Keep a small local swap as a fallback with lower priority:
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon --priority 1 /swapfile
Security tips:
- Isolate NVMe/TCP or NBD traffic on a private VLAN.
- Use firewall rules to limit which hosts can connect.
Which Method Should You Choose For Memory Disaggregation?
Most teams prefer to use method A, including NVMe over TCP with zswap. It offers the best balance of performance, simplicity, and upstream support.
Method B, which is NBD, is best for a Quick lab and minimal dependencies. NBD is suitable for light or medium workloads.
For deep research and RDMA labs, it is recommended to use Method C (Infiniswap or FastSwap). It is closest to true disaggregated memory, and requires kernel modifications, debugging, and careful setup
FAQs
How is disaggregated memory different from traditional swap?
Traditional swap uses local disks, which are much slower than RAM. Disaggregated memory can reside on remote RAM via RDMA or very fast NVMe/TCP.
Which disaggregated memory implementation method is recommended for production?
For most production environments, NVMe-over-TCP with zswap is practical, stable, and upstream-supported.
How do I monitor the performance of disaggregated memory?
You can use vmstat, swapon –show, iostat -x 1, and application-level latency metrics.
Conclusion
Memory disaggregation changes the way servers handle big workloads by separating CPU from memory. Using tools like NVMe over TCP, NBD, or experimental RDMA solutions like Infiniswap and FastSwap, you can use more memory than what is physically in a server while keeping performance reasonable.
For most users, the easiest and most reliable choice is NVMe over TCP with zswap, which is simple, stable, and supported upstream.
Memory disaggregation is a powerful way to expand memory capacity without overbuilding servers.
We hope you enjoy this memory disaggregation server tutorial. Subscribe to our X and Facebook channels to get the latest articles and updates.
For further reading:
Optimizing NVMe Performance in Linux
Strategies for encrypted backup VPS
Sustainable Cooling Techniques for Home Servers or Small Data Centers