Memory Disaggregation Concepts in Practice on Servers

Memory Disaggregation Server Tutorial

Memory Disaggregation Concepts in Practice on Servers

Memory disaggregation is about how servers use and share memory. This will allow one machine to use memory from another instead of using its local RAM. This action is often called remote memory or far memory. This memory disaggregation server tutorial from PerLod Hosting will teach you how to deploy it using Linux servers with three practical methods, including:

  • Method A: Use NVMe over TCP.
  • Method B: Use NBD.
  • Method C: Use RDMA-based remote RAM.

In simple words, memory disaggregation separates CPU from RAM, which means one computer can access and use memory that is physically installed on another computer through a high-speed network. It has better memory use, fewer crashes, and is best for right-sizing.

Now, let’s dive into the guide steps and learn how to deploy memory disaggregation in Linux servers.

Memory Disaggregation Server Tutorial

To start the guide steps, you need two Linux servers running Ubuntu 24.04. It is recommended to use a 10, 25, or 40 GbE network connection; a 1 GbE connection can be used for testing purposes. You must ensure that you have root SSH access to both machines and that their system clocks are synchronized using chrony.

Note: If you plan to use jumbo frames, make sure the MTU is properly configured on both servers.

In this setup, we use one machine as a memory server (target) and the other one as a computer server (host).

Method A: Use NVMe over TCP as Remote Swap for Memory Disaggregation

The first method is to use NVMe over TCP as a remote swap, which is fast, stable, and supported by the Linux Kernel. This turns a remote NVMe device into extra swap space, which gives your server more space during memory pressure.

Set up Memory Server

First, you must set up the target machine or memory server. Install the required tools with the following command:

sudo apt update
sudo apt install nvmetcli nvme-cli -y

Then, you must create a backing block device. In this example, we create a file-backed block device, which means we are using a regular file instead of a physical device to simulate a storage device.

sudo mkdir -p /opt/nvmeof
sudo fallocate -l 200G /opt/nvmeof/backing.img
sudo losetup -fP /opt/nvmeof/backing.img
LOOP=$(losetup -j /opt/nvmeof/backing.img | awk -F: '{print $1}')

Now you can configure your NVMe over Fabrics (NVMe-oF) target to share the virtual block device over the network using TCP:

sudo nvmetcli restore <<'EOF'
{
  "ports": {
    "1": {
      "addr": {
        "trtype": "tcp",
        "traddr": "0.0.0.0",
        "trsvcid": "4420"
      }
    }
  },
  "subsystems": {
    "nqn.2025-10.io.lab:nvme-swap": {
      "nqn": "nqn.2025-10.io.lab:nvme-swap",
      "namespaces": {
        "1": {
          "device": {
            "path": "__LOOP__"
          }
        }
      },
      "allowed_hosts": []
    }
  },
  "hosts": {}
}
EOF

Next, update the NVMe-oF configuration file by replacing a placeholder with the actual device name with the following command:

sudo sed -i "s|__LOOP__|$LOOP|g" /etc/nvme/nvmet/config.json

This ensures that the NVMe target knows which backing device to use.

Clear any existing NVMe target configuration currently loaded in the kernel with the command below:

sudo nvmetcli clear

Finally, load the new configuration and list the active NVMe target configuration so you can verify that the port and subsystem were set up correctly:

sudo nvmetcli restore /etc/nvme/nvmet/config.json
sudo nvmetcli ls

Note: If you want to make your configuration persistent, you need to keep /etc/nvme/nvmet/config.json and reapply on boot via systemd or nvmetcli restore.

Set up Host Server

The next step is to set up the host or compute server. First, install the NVMe tools with the following command:

sudo apt update
sudo apt install nvme-cli -y

Then, you must discover the target or memory server and connect to it. To do this, run the following commands:

sudo nvme discover -t tcp -a TARGET_IP -s 4420
sudo nvme connect  -t tcp -a TARGET_IP -s 4420 -n nqn.2025-10.io.lab:nvme-swap

Be sure to match the name and port defined in the target’s configuration.

Verify that the NVMe device is connected to your device:

lsblk | grep nvme

If the connection was successful, you should see a device such as /dev/nvme0n1, which represents the remote NVMe storage from the target.

Next, you must prepare the device as swap space and activate it. To do this, you can use the following commands:

sudo mkswap /dev/nvme0n1
sudo swapon /dev/nvme0n1

To make the swap space persistent, use the command below:

echo '/dev/nvme0n1 none swap defaults,pri=5 0 0' | sudo tee -a /etc/fstab

The “pri=5” option sets the priority of this swap space. It is useful if you have multiple swap devices.

Verify the configuration with the commands below:

swapon --show
nvme list
Enable zswap Compression

It is recommended to add zswap. It is a compressed cache for swap pages. Instead of writing data directly to the swap device, it compresses it and keeps it in RAM as long as possible, which reduces disk I/O and improves performance.

First, enable zswap at runtime with the command below:

echo 1 | sudo tee /sys/module/zswap/parameters/enabled

Then, set the compression algorithm and memory limit with the commands below:

echo zstd | sudo tee /sys/module/zswap/parameters/compressor
echo 20 | sudo tee /sys/module/zswap/parameters/max_pool_percent

In the command, zstd is chosen as the compressor because it offers a good balance between compression ratio and speed. And max_pool_percent=20 limits zswap’s memory usage to 20% of system RAM, which helps prevent resource contention.

To make these settings persistent, you can add them to the kernel boot parameters using GRUB. To do this, run the commands below:

sudo sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="zswap.enabled=1 zswap.compressor=zstd zswap.max_pool_percent=20 /' /etc/default/grub
sudo update-grub
Test Performance and Behavior of NVMe Swap Device

At this point, you can confirm that zswap and the NVMe swap device are working correctly under memory pressure.

First, install the stress-ng tool, which is used to simulate high system load and memory usage, with the command below:

sudo apt install stress-ng -y

Then, generate memory pressure to observe how zswap behaves:

stress-ng --vm 4 --vm-bytes 85% -t 120s --metrics-brief

Next, you can use the following commands in another terminal to monitor swap and paging activity:

vmstat 1
cat /proc/swaps
dmesg | tail

Note: You can also test your application’s performance under memory pressure or run a storage benchmark like fio on /dev/nvme0n1 to verify network and NVMe throughput.

Method B: Use Network Block Device as Remote Swap (NBD) for Memory Disaggregation

Another method for a memory disaggregation server tutorial is to use NBD. It is simpler than NVMe over TCP and available everywhere. However, it is slower for heavy workloads but perfect for testing or small deployments.

Set up NBD Server (Memory Server)

The first step is to set up the memory server. From your target server, install NBD with the command below:

sudo apt update
sudo apt install nbd-server -y

Then, create the backing file with the following commands:

sudo mkdir -p /opt/nbd
sudo fallocate -l 100G /opt/nbd/swap.img

Next, you must configure your NBD server. To do this, run the command below:

sudo tee /etc/nbd-server/config >/dev/null <<'EOF'
[generic]
user = nbd
group = nbd
listenaddr = 0.0.0.0

[swap-export]
exportname = /opt/nbd/swap.img
port = 10809
EOF

Enable the NBD server on your target machine with the command below:

sudo systemctl enable --now nbd-server

Also, you can verify its listening port with:

ss -ltnp | grep 10809

Set up NBD Client (Compute Server)

From your host server, you must install the NBD client and connect to your NBD server (target machine). To do this, run the commands below:

sudo apt update
sudo apt install nbd-client -y

sudo modprobe nbd max_part=0
sudo nbd-client TARGET_IP 10809 /dev/nbd0

Then, use it as a swap with the commands below:

sudo mkswap /dev/nbd0
sudo swapon /dev/nbd0
echo '/dev/nbd0 none swap defaults,pri=5 0 0' | sudo tee -a /etc/fstab

To verify your configuration, you can use the command below:

swapon --show

Tip: You can also combine zsawp with NBD, the same as method A we explained.

Method C: True Memory Disaggregation with RDMA

There is another method for memory disaggregation server tutorial, which is RDMA remote RAM. If you want true memory disaggregation, not just remote storage, you can use one-sided RDMA with a remote-paging system. This allows a host system to use memory from a remote server as if it were local RAM. It is an advanced method and best for researching.

Two notable research projects provide this functionality:

1. Infiniswap:

  • Provides transparent remote memory paging over RDMA.
  • Paper: NSDI 2017. Code is available online.
  • Requires RDMA NICs (or Soft-RoCE for lab setups), custom kernel modules, and careful tuning.

2. Fastswap:

  • Provides access to far-memory via RDMA with kernel patches.
  • Research code targeting specific kernel versions.

Warning: Both projects are experimental research software. They often require older or patched kernels, manual builds, and may be unstable.

RDMA Setup Using Soft-RoCE

If you don’t have hardware RDMA NICs, you can emulate RDMA using Soft-RoCE (RXE) over standard Ethernet. This works for testing or lab experiments.

From both servers (memory and host), install RDMA userland tools with the command below:

sudo apt update
sudo apt install rdma-core -y

Then, configure RXE over your Ethernet NIC, replace ens192 with your interface:

sudo apt install perftest -y
sudo rdma link add rxe0 type rxe netdev ens192
sudo rdma link

Notes:

  • Ensure the RXE device (rxe0) is UP.
  • Documentation may vary by distribution: RHEL uses rxe_cfg, newer Ubuntu tools use rdma.

Then, from memory and host servers, you must test RDMA connectivity. To do this, run the commands below:

ib_write_lat -d rxe0 -F &            #on-target
ib_write_lat -d rxe0 -F TARGET_IP   #on-host

This measures the RDMA latency and verifies that the path is functional.

Infiniswap Example Workflow

You must keep in mind that the exact steps depend on the repo and kernel version. Always check the README and open issues before proceeding.

High-level outline:

First, you must get the source code on both servers:

git clone https://github.com/SymbioticLab/Infiniswap.git
cd Infiniswap

Then, build the kernel modules and user tools following the repository instructions. You may need matching kernel headers and specific kernel configuration options.

Next, from the target server, you must start the memory server daemon. On the other hand, from the host server, load the client module, so the kernel swap subsystem pages memory out to remote RAM over RDMA instead of local disk.

Finally, enable swap devices using Infiniswap tools. Verify them and monitor stats through debugfs or sysfs paths as described in the README.

Operational Tuning and Verification for Memory Disaggregation

After configuring remote swap or remote memory, you can tune the system for performance and reliability.

Tips for NUMA-aware workload placement:

  • For memory-intensive workloads, try to keep them on the CPU or server closest to local RAM.
  • Treat remote swap or remote RAM as a temporary source, not the primary memory source.
  • In Kubernetes, use soft memory limits, QoS settings, and node-affinity for RAM-hungry pods to avoid excessive remote memory usage.

Tips for System Tuning (sysctl):

Adjust kernel settings carefully to manage swap behavior:

echo 'vm.swappiness=20' | sudo tee /etc/sysctl.d/99-swap.conf
echo 'vm.page-cluster=0' | sudo tee -a /etc/sysctl.d/99-swap.conf
sudo sysctl --system
  • vm.swappiness=20: Kernel prefers RAM over swap; swap is used only when necessary.
  • vm.page-cluster=0: Reduces the number of pages swapped at once, improving latency for memory-heavy workloads.

Tips for measuring memory and swap usage:

Check baseline usage with the following commands:

free -h
cat /proc/swaps
vmstat 1

Generate memory pressure. For example:

stress-ng --vm 6 --vm-bytes 90% --vm-method all --verify -t 180s --metrics-brief

Monitor application latency, tail p99s, and system metrics.

Troubleshooting and Best Practices For Memory Disaggregation

When you use remote swap or disaggregated memory, you may face some common issues such as network congestion, packet loss, MTU mismatches, and target server failures. Here are the best practices to prevent these from happening:

Link congestion: Try to use zswap to reduce I/O pressure, lower swappiness, and prefer NVMe/TCP over NBD for heavy loads.

Packet loss or MTU mismatch: Verify the interface MTU settings, and disable TSO/GRO if issues arise.

Target server down: Keep a small local swap as a fallback with lower priority:

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon --priority 1 /swapfile

Security tips:

  • Isolate NVMe/TCP or NBD traffic on a private VLAN.
  • Use firewall rules to limit which hosts can connect.

Which Method Should You Choose For Memory Disaggregation?

Most teams prefer to use method A, including NVMe over TCP with zswap. It offers the best balance of performance, simplicity, and upstream support.

Method B, which is NBD, is best for a Quick lab and minimal dependencies. NBD is suitable for light or medium workloads.

For deep research and RDMA labs, it is recommended to use Method C (Infiniswap or FastSwap). It is closest to true disaggregated memory, and requires kernel modifications, debugging, and careful setup

FAQs

How is disaggregated memory different from traditional swap?

Traditional swap uses local disks, which are much slower than RAM. Disaggregated memory can reside on remote RAM via RDMA or very fast NVMe/TCP.

Which disaggregated memory implementation method is recommended for production?

For most production environments, NVMe-over-TCP with zswap is practical, stable, and upstream-supported.

How do I monitor the performance of disaggregated memory?

You can use vmstat, swapon –show, iostat -x 1, and application-level latency metrics.

Conclusion

Memory disaggregation changes the way servers handle big workloads by separating CPU from memory. Using tools like NVMe over TCP, NBD, or experimental RDMA solutions like Infiniswap and FastSwap, you can use more memory than what is physically in a server while keeping performance reasonable.

For most users, the easiest and most reliable choice is NVMe over TCP with zswap, which is simple, stable, and supported upstream.

Memory disaggregation is a powerful way to expand memory capacity without overbuilding servers.

We hope you enjoy this memory disaggregation server tutorial. Subscribe to our X and Facebook channels to get the latest articles and updates.

For further reading:

Optimizing NVMe Performance in Linux

Strategies for encrypted backup VPS

Sustainable Cooling Techniques for Home Servers or Small Data Centers

Post Your Comment

PerLod delivers high-performance hosting with real-time support and unmatched reliability.

Contact us

Payment methods

payment gateway
Perlod Logo
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.