Fix IRQ Imbalance Latency on Linux: IRQ Affinity and NIC Tuning
IRQ imbalance latency is an essential performance issue that affects Linux systems, especially those handling high network throughput or real-time workloads. When hardware devices like network interface cards (NICs) need CPU attention, they generate interrupt requests. However, if these interrupts aren’t distributed properly across CPU cores, certain cores become overloaded while others remain idle, which causes random latency spikes that can impact application performance.
This guide from PerLod Hosting explains how IRQ (Interrupt Request) imbalance causes latency issues and provides comprehensive solutions using IRQ affinity tuning and NIC optimization techniques.
Table of Contents
Common Causes of IRQ Imbalance Latency
IRQ imbalance happens when one or a few CPU cores handle most hardware interrupts instead of sharing the work across all cores. Knowing why this happens makes it easier to apply the right solution.
How Interrupts Work in Linux:
When a NIC (network card) receives packets, it interrupts the CPU to get the kernel’s attention. After that quick interrupt, Linux does the heavier packet process in a separate step called a SoftIRQ. This happens in two main stages:
- Hard IRQ (fast step): A very short handler runs just to acknowledge the interrupt and queue the real process.
- SoftIRQ (process step): Linux processes the packets, including NET_RX for receive and NET_TX for transmit, usually through NAPI polling.
If most NIC interrupts land on one core, often CPU0, that core becomes a bottleneck. Even if interrupts are spread out, a bad spread can still hurt because it increases cache misses, memory contention, and context switching, which leads to random latency spikes when the CPU spends too much time handling interrupts instead of running your applications.
Primary Causes of IRQ Imbalance Include:
Single Core Overload: If interrupts mostly go to CPU0, that core gets overloaded while other cores sit idle, which slows down your applications.
IRQbalance Conflicts: The irqbalance automatically moves interrupts between CPU cores. That can override your manual IRQ pinning and cause inconsistent and random latency spikes.
NUMA Topology Mismatch: On dual-socket (NUMA) servers, the NIC is local to one CPU socket. If its interrupts run on cores from the other socket, the packets must cross between sockets, which adds extra delay and can seriously reduce performance.
Inadequate NIC Queue Distribution: Modern NICs can spread incoming traffic across multiple receive queues (RSS), and each queue can be handled by a different CPU core. If you don’t have enough queues or RSS isn’t set up right, the interrupt load cannot be properly distributed.
Cache Contention and Misses: If interrupts keep bouncing between CPU cores, the CPU can’t reuse cached data. That forces extra memory loads and can slow down both packet processing and your applications.
Interrupt Coalescing Issues: NICs can group packets and send fewer interrupts. If this is tuned badly, you either get too many interrupts (wasting CPU) or too much waiting (adding latency).
Latency Spike Sign Include:
When IRQs are imbalanced, performance feels unstable. Apps may randomly slow down, network traffic may drop or time out, and monitoring often shows one CPU core doing most of the SoftIRQ process while other cores stay mostly idle. Also, packet drops can keep increasing, and real-time workloads may show jitter or missed deadlines.
How to Detect IRQ Imbalance Latency?
Before implementing fixes, you must accurately detect and measure IRQ imbalance. Linux provides several tools to monitor interrupt distribution and identify bottlenecks.
View Interrupt Distribution
The /proc/interrupts file provides real-time interrupt counters for each IRQ and CPU core. To view interrupt distribution, you can run:
cat /proc/interrupts
With this command, you will see a list where the first column is the IRQ number, the middle columns are how many interrupts each CPU core handled, and the last column is the device name.
Find your NIC’s interrupts, often named like eth0, enp1s0, or driver names such as mlx5, then check whether most of the counts are piling up on just one or two CPU columns.
If one CPU’s numbers keep rising fast while other CPUs stay near zero, that usually means an IRQ imbalance.
To continuously monitor interrupt changes, you can use the watch command:
watch -n1 'cat /proc/interrupts | grep -E "CPU|eth0"'
This updates every second and filters for your network interface. You must look for rapidly increasing counters on specific CPUs.
On multi-queue NICs, you’ll see a separate interrupt line for each RX and TX queue, for example, eth0-rx-0, eth0-rx-1, eth0-tx-0, and eth0-tx-1. Ensure that these queue interrupts are spread across different CPU cores rather than being directed to the same cores.
Monitor SoftIRQ Processing
SoftIRQs handle the bulk of network packet processing. You can use the /proc/softirqs file to display the SoftIRQ counters per CPU:
cat /proc/softirqs
Check the NET_RX and NET_TX lines, ensuring the counts are spread across multiple CPU cores. If one core has most of the activity, packet processing is imbalanced.
To monitor SoftIRQ activity in real-time, you can run the following watch command:
watch -n1 'grep -E "CPU|NET_RX|NET_TX" /proc/softirqs'
Also, you can use the mpstat command to see the percentage of CPU time spent in SoftIRQ context:
mpstat -P ALL 1
The %soft column shows how much time each CPU core spends handling SoftIRQs. If one core has a much higher %soft than the others, the interrupt process isn’t balanced and can cause latency spikes.
Identify IRQ Numbers for Network Interfaces
To configure IRQ affinity, you need to identify the specific IRQ numbers associated with your network interface. For interfaces with a single IRQ, you can use the command below:
cat /sys/class/net/eth0/device/irq
Replace eth0 with your interface name. This displays the legacy IRQ number.
However, modern NICs use MSI-X interrupts with multiple IRQs. To find all IRQ numbers for a multi-queue NIC, you can use the following command:
grep eth0 /proc/interrupts | awk '{print $1}' | sed 's/://g'
This displays all IRQ numbers associated with your interface. For more detailed information, you can run:
ls /sys/class/net/eth0/device/msi_irqs/
This directory includes all MSI-X interrupt numbers for the device.
Check Current IRQ Affinity
To see which CPUs currently handle a specific IRQ, you can use the command below:
cat /proc/irq/125/smp_affinity_list
Remember to replace the 125 with your IRQ number. This command shows the CPU cores assigned to handle that interrupt, for example, 0-3 means CPUs 0 through 3.
Alternatively, you can view the bitmask representation with the command below:
cat /proc/irq/125/smp_affinity
This shows a hexadecimal bitmask where each bit represents a CPU core.
To check affinity for all network-related IRQs at once, you can use the command below:
for irq in $(grep eth0 /proc/interrupts | awk '{print $1}' | sed 's/://g'); do
echo "IRQ $irq: $(cat /proc/irq/$irq/smp_affinity_list)"
done
Detect NUMA Node Locality
On NUMA systems, you can use the command below to find which NUMA node your NIC is attached to:
cat /sys/class/net/eth0/device/numa_node
This returns the NUMA node number, for example, 0 or 1. You should pin IRQs to CPU cores on the same NUMA node for optimal performance.
To list CPUs local to the NIC’s NUMA node, you can run the command below:
cat /sys/class/net/eth0/device/local_cpulist
This shows which CPU cores are directly connected to the NIC’s PCIe bus, which avoids cross-NUMA traffic.
Also, you can use the lscpu command to display NUMA topology:
lscpu | grep NUMA
Or, you can install hwloc for visual topology mapping:
sudo apt install hwloc
lstopo --logical
Monitor Packet Drops and Errors
You can use the command below to check for packet drops; these often mean the network buffers are filling up because interrupts aren’t being handled fast enough.
ethtool -S eth0 | grep -i drop
Look for counters like rx_dropped, rx_missed_errors, or per-queue drop stats. If these numbers aren’t zero and keep going up, Linux isn’t processing incoming packets fast enough, and the receive buffers are overflowing.
Also, check standard interface statistics with the following command:
ip -s link show eth0
The RX errors and dropped counters are quick warning signs that something is wrong with packet receiving, like overload or buffering issues.
Use Latency Measurement Tools
To measure latency accurately, run cyclictest from the rt-tests package, which repeatedly schedules timed wakeups and reports the latency it sees:
sudo apt install rt-tests
sudo cyclictest -p 95 -t12 -n -i 200 -d 0
It measures how long real-time threads get delayed on each CPU core. If the max latency number is high, for example, over 100 microseconds, interrupts are likely getting in the way of real-time tasks.
You can also use the perf command to analyze interrupt overhead:
sudo perf record -e irq:* -ag -- sleep 10
sudo perf report
This captures all IRQ events and shows which interrupts consume the most CPU time.
How to Fix IRQ Imbalance Latency: IRQ and NIC Tuning
Once you have identified an IRQ imbalance, you can implement these solutions to optimize interrupt distribution and reduce latency.
Disable irqbalance Service
You must stop the irqbalance service, which interferes with manual IRQ affinity settings. Check if the irqbalance service is running:
systemctl status irqbalance
If it is activated, use the commands below to stop and disable it:
sudo systemctl stop irqbalance
sudo systemctl disable irqbalance
This prevents irqbalance from automatically redistributing your manually configured IRQ affinities. The disable command ensures it won’t restart after reboot.
Manual IRQ Affinity Configuration
You can manually set IRQ affinity by writing to the smp_affinity_list file. In this way, you will get control over which CPU cores handle each interrupt.
To assign IRQ 125 to CPU core 4, you can run the command below:
echo 4 | sudo tee /proc/irq/125/smp_affinity_list
To assign an IRQ to multiple cores, for example, CPUs 0-3, you can run:
echo 0-3 | sudo tee /proc/irq/125/smp_affinity_list
For non-contiguous CPUs, for example, cores 0, 2, 4, 6, you can run the command below:
echo 0,2,4,6 | sudo tee /proc/irq/125/smp_affinity_list
Also, you can use the hexadecimal bitmask format via smp_affinity. To assign IRQ 125 to CPU 4, you can use:
echo 10 | sudo tee /proc/irq/125/smp_affinity
Each bit position dispalys a CPU core, so CPU 0 = 0x1, CPU 1 = 0x2, CPU 2 = 0x4, CPU 3 = 0x8, CPU 4 = 0x10, and so on.
Distribute Multi-Queue NIC Interrupts
For multi-queue NICs with separate RX and TX queues, you must distribute interrupts across multiple cores.
First, identify all IRQ numbers for your interface with the command below:
grep eth0 /proc/interrupts
Then, assign each queue to a different core. For example, with 8 RX queues and 8 TX queues, you can run the commands below:
# RX queues
echo 0 | sudo tee /proc/irq/120/smp_affinity_list # eth0-rx-0 -> CPU 0
echo 1 | sudo tee /proc/irq/121/smp_affinity_list # eth0-rx-1 -> CPU 1
echo 2 | sudo tee /proc/irq/122/smp_affinity_list # eth0-rx-2 -> CPU 2
echo 3 | sudo tee /proc/irq/123/smp_affinity_list # eth0-rx-3 -> CPU 3
# TX queues
echo 4 | sudo tee /proc/irq/124/smp_affinity_list # eth0-tx-0 -> CPU 4
echo 5 | sudo tee /proc/irq/125/smp_affinity_list # eth0-tx-1 -> CPU 5
echo 6 | sudo tee /proc/irq/126/smp_affinity_list # eth0-tx-2 -> CPU 6
echo 7 | sudo tee /proc/irq/127/smp_affinity_list # eth0-tx-3 -> CPU 7
This distributes the interrupt load evenly across 8 cores. Remember to adjust based on your system’s core count and workload requirements.
Use set_irq_affinity Scripts
Many NIC vendors provide set_irq_affinity scripts that automate IRQ distribution.
For Intel NICs, you can use:
sudo /usr/local/bin/set_irq_affinity.sh eth0
For Mellanox and NVIDIA NICs, you can use set_irq_affinity_bynode.sh to bind interrupts to a specific NUMA node:
sudo /usr/sbin/set_irq_affinity_bynode.sh 0 eth0
This binds all eth0 interrupts to CPUs on NUMA node 0.
If your driver package doesn’t include these scripts, you can find generic versions online or create your own. A basic script structure looks like this:
#!/bin/bash
INTERFACE=$1
for IRQ in $(grep $INTERFACE /proc/interrupts | awk '{print $1}' | sed 's/://g'); do
QUEUE_NUM=$(grep -m1 " $IRQ:" /proc/interrupts | grep -oP "$INTERFACE-[rt]x-\K[0-9]+")
if [ -n "$QUEUE_NUM" ]; then
echo $QUEUE_NUM | sudo tee /proc/irq/$IRQ/smp_affinity_list
echo "IRQ $IRQ -> CPU $QUEUE_NUM"
fi
done
Save this as set_irq_affinity.sh, make it executable, and run it with your interface name.
NUMA-Aware IRQ Affinity
On NUMA systems, you must always bind IRQs to the same NUMA node as the NIC to avoid cross-socket memory access.
First, determine the NIC’s NUMA node with the command below:
cat /sys/class/net/eth0/device/numa_node
If the NIC is on NUMA node 0, list CPUs on that node with the command below:
lscpu | grep "NUMA node0 CPU"
For example, if node 0 has CPUs 0-11 and 24-35 with hyperthreading, assign IRQs only to these cores:
echo 0-11 | sudo tee /proc/irq/125/smp_affinity_list
Avoid assigning IRQs to CPUs on other NUMA nodes. This keeps packet buffers in local memory, which reduces access latency.
For optimal performance, you can pin your application threads to the same NUMA node using taskset or numactl:
numactl --cpunodebind=0 --membind=0 ./your_application
This ensures both interrupt processing and application processing happen on the same NUMA node with local memory access.
Persistent IRQ Affinity Configuration
Manual IRQ affinity settings reset after reboot. You must make them persistent by creating a systemd service or using the rc.local.
Create a systemd service file /etc/systemd/system/irq-affinity.service and add the following content to it:
[Unit]
Description=Set IRQ Affinity for Network Interfaces
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/set_irq_affinity.sh eth0
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Apply the changes and enable the service with the commands below:
sudo systemctl daemon-reload
sudo systemctl enable irq-affinity.service
sudo systemctl start irq-affinity.service
Alternatively, you can add commands to /etc/rc.local file and ensure it is executable:
#!/bin/bash
# Set IRQ affinity for eth0
for IRQ in $(grep eth0 /proc/interrupts | awk '{print $1}' | sed 's/://g'); do
QUEUE_NUM=$(grep " $IRQ:" /proc/interrupts | grep -oP "eth0-[rt]x-\K[0-9]+")
[ -n "$QUEUE_NUM" ] && echo $QUEUE_NUM > /proc/irq/$IRQ/smp_affinity_list
done
exit 0
NIC Queue Configuration
You must adjust the number of RX and TX queues to match your CPU topology. To view the current queue counts, you can use the command below:
ethtool -l eth0
This displays maximum supported channels and current settings. To set the number of combined RX and TX channels to 8, you can use the command below:
sudo ethtool -L eth0 combined 8
For separate RX and TX queues, you can use this command:
sudo ethtool -L eth0 rx 8 tx 8
Generally, you can set the queue count equal to the number of CPU cores you want to dedicate to network processing.
Receive Side Scaling Configuration (RSS)
RSS distributes incoming packets across multiple RX queues using a hash function on packet headers, including source, destination IP, and port. To verify RSS is enabled, you can run the command below:
ethtool -n eth0 rx-flow-hash tcp4
This should display that IP addresses and ports are included in the hash.
If it is not enabled, use the commands below to enable RSS:
sudo ethtool -N eth0 rx-flow-hash tcp4 sdfn
sudo ethtool -N eth0 rx-flow-hash udp4 sdfn
The sdfn flags in the above command include P, destination IP, source port, and destination port in the hash calculation, which ensures good traffic distribution.
Some NICs let you control how RSS spreads traffic across receive queues by changing the RSS mapping table. For deeper RSS tuning, you can check your NIC driver documentation.
Interrupt Coalescing Tuning
Interrupt coalescing controls how many packets or how much time passes before the NIC triggers an interrupt.
To display the current settings, you can run the command below:
ethtool -c eth0
Key parameters include:
- rx-usecs: How long to wait before triggering an RX interrupt.
- rx-frames: How many RX packets to collect before triggering an interrupt.
- tx-usecs: How long to wait before triggering a TX interrupt.
- tx-frames: How many TX packets to collect before triggering an interrupt.
- adaptive-rx and adaptive-tx: Automatically adjust coalescing based on traffic.
For low-latency workloads, you can reduce coalescing to minimize packet processing delay with the command below:
sudo ethtool -C eth0 rx-usecs 0 rx-frames 1
This triggers an interrupt for every packet, which reduces latency but increases CPU overhead.
For high-throughput workloads where latency is less essential, you can use the command below:
sudo ethtool -C eth0 rx-usecs 100 rx-frames 128
This will batch interrupts, which reduces CPU overhead but adds latency.
Many modern NICs support adaptive interrupt coalescing, which automatically adjusts based on traffic patterns. To enable it, you can run the command below:
sudo ethtool -C eth0 adaptive-rx on adaptive-tx on
You can enable this for mixed workloads. During heavy traffic, the NIC groups more packets together before interrupting the CPU; when traffic is low, it sends interrupts sooner to keep latency down.
Ring Buffer Tuning
Ring buffers temporarily store incoming and outgoing packets. Insufficient buffer size causes packet drops under load.
To check current and maximum ring buffer sizes, you can run the following command:
ethtool -g eth0
If current values are below maximum and you’re experiencing drops, you can increase them with:
sudo ethtool -G eth0 rx 4096 tx 4096
Larger buffers reduce drops during traffic bursts but increase memory usage. To monitor packet drop statistics after tuning, you can use the command below:
ethtool -S eth0 | grep -i drop
Drop counters should stop increasing if buffer sizes are adequate.
Application and IRQ Co-location
For better performance, you can run your application threads on the same CPU cores that handle network interrupts. This increases cache efficiency since packet data is already in the core’s cache when the application processes it.
If IRQs are pinned to CPUs 0-7, pin your application to the same cores with the command below:
taskset -c 0-7 ./your_application
Or you can use numactl for more control:
numactl --physcpubind=0-7 --membind=0 ./your_application
This technique, called interrupt and application co-location, which minimizes cache misses and memory access latency.
CPU Isolation for Dedicated Workloads
For apps that need very low latency, you can isolate a few CPU cores so the normal Linux scheduler doesn’t run random background tasks on them. That way, those cores are mostly reserved for your application, which helps reduce jitter and random latency spikes.
Edit /etc/default/grub file and add isolcpus to the kernel command line:
GRUB_CMDLINE_LINUX="isolcpus=4-7 nohz_full=4-7 rcu_nocbs=4-7"
This isolates CPUs 4-7 from the scheduler, disables timer interrupts on those cores, and prevents RCU callbacks.
Update GRUB and reboot:
sudo update-grub
sudo reboot
After reboot, pin your latency-sensitive application to isolated cores with the command below:
taskset -c 4-7 ./realtime_application
Also, pin the network IRQs to non-isolated cores, like 0-3, so they don’t interfere with your application.
Monitor and Validate IRQ is Balanced
After you have implemented the IRQ imbalance fixes, you can validate that IRQ distribution is balanced and latency has improved.
To monitor interrupt distribution, you can run the command below:
watch -n1 'cat /proc/interrupts | grep eth0'
Verify that all CPUs show increasing interrupt counts, not just one or two cores.
To check SoftIRQ distribution, run the command below:
watch -n1 'grep -E "CPU|NET_RX" /proc/softirqs'
NET_RX values should increase across multiple cores.
You can monitor CPU usage with mpstat command:
mpstat -P ALL 1 10
The %soft column should be distributed across cores, not concentrated on one.
Measure application latency before and after tuning. For network applications, use tools like iperf3, netperf, or application-specific benchmarks to measure throughput and latency improvements.
Additional Performance Considerations for IRQ affinity and NIC Tuning
Besides IRQ affinity and NIC tuning, there are a few other system-level optimizations that can improve performance and reduce latency.
1. Kernel Parameters: You can adjust network stack parameters in the /etc/sysctl.conf file:
# Increase maximum socket receive buffer
net.core.rmem_max = 134217728
# Increase maximum socket send buffer
net.core.rmem_default = 16777216
net.core.wmem_max = 134217728
net.core.wmem_default = 16777216
# Increase TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# Increase netdev budget for SoftIRQ processing
net.core.netdev_budget = 600
net.core.netdev_budget_usecs = 8000
# Increase backlog for network device input queue
net.core.netdev_max_backlog = 10000
Apply changes with the command below:
sudo sysctl -p
2. Hardware Considerations: Use a modern NIC that supports MSI-X and has enough hardware queues, because that helps spread interrupt work across CPU cores. For high performance, enable and choose NIC features like TSO, GRO, and checksum offload to reduce CPU load.
Also, make sure the NIC has enough PCIe bandwidth. 10G NICs require at least PCIe Gen2 x8, while 40G and 100G NICs need PCIe Gen3 x8 or x16.
To check PCIe speed and width, you can run the command below:
sudo lspci -vv -s $(lspci | grep Ethernet | awk '{print $1}' | head -1) | grep -i "lnksta"
3. Firmware and Driver Updates: Keep your NIC driver and firmware up to date, because updates often fix bugs and improve interrupt handling and performance.
To see what driver your NIC is using and its version, run the command below:
ethtool -i eth0
If you need high-performance hosting with optimized network configurations, you can check PerLod for Dedicated Servers with expert-tuned network settings.
FAQs
What are the most common signs of IRQ imbalance?
Typical signs of IRQ imbalance include:
– Random slow responses, even when average load looks OK.
– One CPU core much busier than others, especially SoftIRQ.
– Packet drops increasing during traffic bursts.
What’s the difference between IRQs and SoftIRQs?
Hard IRQs are the wake up signal from hardware, while SoftIRQs do most of the packet processing task. Imbalance in either can lead to latency issues.
What is IRQ affinity, and why does it help?
IRQ affinity is choosing which CPU cores handle a device’s interrupts. It helps by spreading interrupt work across multiple cores or keeping it on the right cores, which reduces bottlenecks and jitter.
Conclusion
IRQ imbalance latency can seriously affect Linux performance, especially on busy network servers and real-time systems. If you understand how interrupts are spread across CPU cores, watch for imbalance with basic monitoring, and then fix it using IRQ affinity and NIC tuning, you can reduce random latency spikes and get more stable and predictable performance.
Proper IRQ tuning can reduce latency by 50% or more and increase network throughput. Always measure performance before and after each tuning so you can confirm the improvement on your exact hardware and workload.
We hope you enjoy this guide. Subscribe to our X and Facebook channels to get the latest updates and articles.