
Monitoring I/O Latency on Dedicated Servers
When you run apps on dedicated servers, one of the key performance factors to monitor is I/O latency. Input and Output latency refers to the delay between a request to read or write data and the completion of the request. High I/O latency can slow down databases, websites, and other applications, directly affecting user experience. By monitoring I/O latency on dedicated servers, you can identify performance bottlenecks, detect failing hardware early, and ensure the systems are running smoothly.
At PerLod Hosting, you can find reliable dedicated server hosting solutions with monitoring options to ensure consistent performance and minimal downtime.
Table of Contents
Monitoring I/O Latency on Dedicated Servers
Let’s start by understanding what I/O latency really means.
Latency is the time from when an I/O request is issued until it completes. Throughput is MB/s or IOPS. High throughput with low latency is good. Queue depth refers to how many I/Os are outstanding. Large queues with rising latency cause saturation.
By following this guide steps, you will learn to:
- Identify devices and media.
- Live triage.
- Capture sustained data.
- Validate with eBPF histograms and fio if needed.
- Correlate with the application.
- Fix the root cause.
- Set SLOs and alerts.
Now, follow the steps below to install the required tools on Linux dedicated servers and start monitoring I/O latency.

1. Install Required Tools for Monitoring I/O Latency
Before you can monitor I/O latency, you need the proper tools installed. These utilities allow you to measure disk performance, identify bottlenecks, and even run tests. Depending on your OS, install the monitoring tools.
On Debian/Ubuntu:
sudo apt update
sudo apt install sysstat iotop ioping fio dstat bpfcc-tools linux-headers-$(uname -r) -y
On RHEL/CentOS/Alma/Rocky:
sudo dnf update -y
sudo dnf epel-release -y
sudo dnf install sysstat iotop ioping fio dstat bcc-tools bpftool \
kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
Note: If bpfcc-tools or bcc-tools are not available, skip the eBPF sections.
2. Discover What Storage Devices Are Measuring
It’s essential to know the type of storage device you are measuring. You can display your disks, partitions, whether they are SSD or HDD, and which I/O scheduler is in use. To do this, you use the following commands.
To display disks, partitions, size, and mountpoints, run:
lsblk -o NAME,TYPE,MODEL,SIZE,ROTA,PKNAME,MOUNTPOINT
- ROTA=0 refers to SSD or NVMe.
- ROTA=1 refers to a spinning HDD.
Check which I/O scheduler is in use, such as mq-deadline, bfq, or none:
cat /sys/block/sdX/queue/scheduler
#Or you can use:
cat /sys/block/nvme0n1/queue/scheduler
Also, you can check filesystems and their backing devices with:
findmnt -D
3. Quick Health Check (60-Second Diagnosis)
When you suspect a problem, a quick 60-second health check can tell you if the issue is CPU waiting on I/O, disk saturation, or a single process overloading the system. To run these health checks, you can run the following commands.
CPU I/O wait and run queue:
vmstat 1
The wa column in your output is the % CPU waiting on I/O. If r > number of CPUs, the system is overloaded.
Device latency and saturation:
iostat -x 1
Key columns in the output:
- await: Average latency (ms).
- aqu-sz: Average queue depth (how many ops waiting).
- %util: Device busy time (close to 100% = saturated).
See which processes are causing I/O:
sudo iotop -oPa
- -o: Only processes doing I/O.
- -P: Per process, not per thread.
- -a: Accumulate totals over time.
Per-process detail:
pidstat -d 1
pidstat -d -p <PID> 1 # for a single process
4. Long-Term I/O Tracking
If issues are not constant, you need ongoing latency monitoring. With tools like sar and iostat, you can collect statistics continuously, which helps you catch spikes and trends over time.
Collect disk stats every second for an hour, with the following sar command:
sar -d 1 3600
Export later to CSV with the command below:
sadf -d /var/log/sysstat/sa$(date +%d) -- -d > disk.csv
Also, a simple rolling log with the iostat command:
iostat -x 1 | ts '[%Y-%m-%d %H:%M:%S]' | tee io.log
The ts adds timestamps from moreutils.
5. Using Diskstats for Accurate I/O Data
In Linux, you can display raw disk statistics in /proc/diskstats. Reading these values directly gives you precise latency, utilization, and queue depth numbers. It is useful for scripts and automation.
To display raw disk statistics, you can use:
cat /proc/diskstats
Columns include read/writes, latency, and queue depth. Example formula for read latency:
Avg read latency ≈ Δrd_ms / Δrd_ios
A tiny sampler script can automate this. For example:
#!/usr/bin/env bash
DEV=${1:-sda}
read -r _ _ _ rIO _ _ rMS wIO _ _ wMS inprog ioMS wioMS < <(awk -v d="$DEV" '$3==d{print}' /proc/diskstats)
t0=$(date +%s%3N)
sleep 1
read -r _ _ _ rIO2 _ _ rMS2 wIO2 _ _ wMS2 inprog2 ioMS2 wioMS2 < <(awk -v d="$DEV" '$3==d{print}' /proc/diskstats)
t1=$(date +%s%3N)
dt=$((t1 - t0))
dr=$((rIO2 - rIO)); dw=$((wIO2 - wIO))
drms=$((rMS2 - rMS)); dwms=$((wMS2 - wMS))
dwio=$((wioMS2 - wioMS)); dio=$((ioMS2 - ioMS))
rlat=$( [ "$dr" -gt 0 ] && echo "scale=2; $drms/$dr" | bc || echo 0 )
wlat=$( [ "$dw" -gt 0 ] && echo "scale=2; $dwms/$dw" | bc || echo 0 )
qdepth=$(echo "scale=2; $dwio/$dt" | bc)
util=$(echo "scale=2; 100*$dio/$dt" | bc)
echo "dev=$DEV rlat_ms=$rlat wlat_ms=$wlat avgq=$qdepth util%=$util"
6. Advanced Latency Tracing with eBPF
eBPF tools allow you to see what’s happening inside the kernel in real time. They can show per-I/O latency histograms and even trace which processes are causing spikes.
Note: Advanced tracing requires a recent kernel, headers, and BCC installed.
For the latency histogram per device, you can run:
sudo /usr/share/bcc/tools/biolatency 1 10
It buckets I/O latency into a histogram.
For per I/O trace, you can run:
sudo /usr/share/bcc/tools/biosnoop
It shows process, device, block address, size, and latency.
7. Testing I/O Latency with Benchmarks
To confirm the actual speed of your storage, you can run safe probes with ioping or conduct stress tests with fio. These simulate workloads and reveal whether the hardware meets expectations.
For a quick probe with ioping, you can run:
ioping -c 10 /var
ioping -R -c 10 /var # cached vs direct mix
ioping -D -c 10 /var # direct I/O (bypass page cache)
Reproducible load with fio: Do not run on production volumes without understanding the impact.
For a random 4k read test, you can adjust the filename to randread.fio and add the following content:
[global]
ioengine=libaio
iodepth=32
direct=1
time_based=1
runtime=60
numjobs=1
group_reporting=1
[randread]
rw=randread
bs=4k
filename=/dev/nvme0n1
It is NVMe and SSD-friendly. Then, run the stress test with fio:
sudo fio randread.fio
Look for:
- clat: Completion latency.
- slat: Submission latency.
- lat percentiles (p50, p99).
8. Real-Time I/O Monitoring Dashboards (Prometheus + Grafana)
For long-term monitoring in production, you can use Prometheus metrics and Grafana dashboards. This provides clear visualizations and alerts when latency crosses critical thresholds. With node_exporter, you can collect disk metrics. Here are PromQL query examples:
Avg read latency (ms) per device (5m window):
1000 * sum by (device) (
rate(node_disk_read_time_seconds_total[5m])
)
/
sum by (device) (rate(node_disk_reads_completed_total[5m]))
Avg write latency (ms) per device:
1000 * sum by (device) (
rate(node_disk_write_time_seconds_total[5m])
)
/
sum by (device) (rate(node_disk_writes_completed_total[5m]))
Utilization (%):
100 * rate(node_disk_io_time_seconds_total[5m])
Queue depth:
rate(node_disk_io_time_weighted_seconds_total[5m])
Tip: To get detailed information for monitoring Linux servers with Prometheus, you can check this guide on Monitoring Linux host metrics with the Node Exporter.
9. Identifying Root Causes of I/O Problems
High latency can have many causes, from overloaded disks to bad cabling. By matching symptoms, you can discover the root cause quickly.
Here is a quick diagnosis by symptoms:
- High await, high %util, high aqu-sz: Device saturated. Solution: faster disk or workload shaping.
- High await, low %util: Stalls, firmware, and cable issues. Check logs (dmesg).
- Good device metrics, but the app is slow: Filesystem or DB contention.
- High iowait and normal disk latency: Could be network storage (NFS).
10. Filesystem and Mount Options That Influence Latency
Not all latency comes from the disk itself; sometimes, the way your filesystem or mount options are configured can add overhead. Optimizing these settings helps reduce unnecessary I/O and keeps performance consistent.
XFS: It is the default option on most distros. Excellent for parallelism; performance is mostly tied to device speed.
ext4: Safe defaults; don’t disable barriers; schedule fstrim instead of continuous discard.
noatime/relatime: Reduce metadata writes for read-heavy workloads.
Swap thrash: When memory is low, disks get overwhelmed. You can check with:
vmstat 1
cat /proc/pressure/io
11. NVMe and SMART Health Checks
Even high-performance disks like SSDs and NVMe drives can experience hidden issues. SMART tools let you check for errors, temperature, and warning signs that may cause latency spikes or future failures.
To do this, you can use the following commands:
sudo smartctl -a /dev/nvme0
sudo smartctl -a /dev/sda
Look for media errors, CRC errors, and temperature warnings. While SMART won’t show per-I/O latency, frequent errors usually explain latency anomalies.
Virtualization and Latency Considerations
If your dedicated server hosts virtual machines (VMs), latency may appear inside a VM even if the physical host looks fine. This happens due to oversubscription, throttling, or noisy neighbors on shared storage.
Always measure latency on both the host and the guest.
- For KVM, watch bdi writeback and virtio queues.
- For VMware, check datastore latency and outstanding I/O requests.
This ensures you know if the problem is with the hardware, hypervisor, or guest.
Establishing Baselines and Latency SLOs
You can not manage what you don’t measure. Before setting expectations, capture a baseline of how your disks perform when idle and under peak load. Then define SLOs (Service Level Objectives) for latency. It is best to:
- Measure idle performance for 10–15 minutes.
- Measure peak load performance for the same duration.
- Track latency at p50 (median), p95, and p99 levels.
- Re-check after any system change (scheduler, firmware, RAID settings, or filesystem tuning).
Example SLO: p99 write latency should remain below 25 ms.
FAQs
What is I/O latency on a dedicated server?
I/O latency is the time it takes for a storage request (read/write) to complete. High latency slows down websites, databases, and applications running on your server.
What is a good I/O latency for SSDs vs HDDs?
SSD/NVMe: <1 ms to a few ms for most operations.
HDDs: 5–15 ms is normal; anything consistently higher can indicate issues.
How do I know if my server has high I/O latency?
Use tools like iostat, vmstat, or iotop to check latency and utilization. If await values are consistently high, or %util is close to 100%, your disks may be saturated.
Conclusion
Monitoring I/O latency on dedicated servers is essential for ensuring stable performance and a smooth user experience. By following the guide steps above, you will gain full visibility into your storage performance.
Keeping I/O latency low means faster applications, happier users, and reliable infrastructure. If you’re looking for dedicated servers with performance monitoring support built in, PerLod Web Hosting provides optimized hosting solutions designed for speed, stability, and transparency.
We hope you enjoy this guide. Subscribe to X and Facebook channels to get the latest monitoring and performance tips.
For further reading:
Implement QoS at the Hardware Level for VPS