How to Detect and Fix Disk I/O Bottlenecks with iostat and atop
Disk I/O monitoring Linux means watching how busy your storage is with reads and writes, how many operations happen per second, and whether the disk is overloaded. When disk I/O is slow or saturated, the whole server can feel stuck even if CPU and RAM look fine.
In this guide from PerLod Hosting, we want to use the iostat and atop tools to monitor disk I/O bottlenecks in Linux.
The iostat is a lightweight tool that gives a clear per-disk view of performance, like read and write throughput and how busy each disk is. The atop is helpful when you want a bigger picture, because it shows overall system load and also includes disk activity.
Table of Contents
Disk I/O Monitoring Linux with iostat
You can use iostat to get a quick and high-level view of disk performance.
iostat is one of the most common Linux tools for disk I/O monitoring because it shows how busy the disks are and how long I/O requests are waiting, but it does not directly tell you which specific process is causing the load.
Install iostat on Linux
The iostat is part of the sysstat package on Linux distributions. Depending on your OS, use the commands below to install the sysstat package.
For Debian and Ubuntu, you can use:
sudo apt update
sudo apt install sysstat -y
For RHEL, AlmaLinux, and Rocky Linux, you can use:
sudo dnf update -y
sudo dnf install sysstat -y
sudo systemctl enable --now sysstat
The iostat Command Structure and Options
The basic iostat command gives a snapshot since the server booted, and this is not very useful for real-time monitoring. You need to use flags or options to get live data.
The syntax of the iostat command looks like this:
iostat [options] [interval] [count]
The essential flags used in the iostat command include:
| Flag | Function | Why use it? |
|---|---|---|
| -x | Extended Statistics | Without this, you only get basic CPU data. This unlocks the detailed disk metrics. |
| -z | Omit Zero (Idle) | Hides drives that are doing nothing. Keeps your screen clean if you have many partitions. |
| -d | Disk Only | Hides CPU reports so you can focus entirely on storage. |
| -k or -m | Kilobytes / Megabytes | Displays transfer speeds in KB or MB instead of blocks. Easier for humans to read. |
| -p | Partitions | Shows stats for specific partitions rather than just the whole disk. |
| -t | Timestamp | Prints the time next to each report. Useful for logging to a file. |
For most of the troubleshooting scenarios, you can use the following command:
iostat -xz 1
This will show extended stats, hide zero activity disks, and refresh every 1 second.
Understand the iostat Results
When you run the iostat -x command, you will get a lot of data in your output. Here is an explanation of every column:
| Column | Full Name | Explanation |
|---|---|---|
| r/s | Reads per Second | How many times the system ask the disk to “read” something per second? |
| w/s | Writes per Second | How many times the system ask the disk to “write” something per second? |
| rkB/s | Read KB per Second | The volume of data being read. High r/s with low rkB/s means the drive is reading many tiny files (random I/O). |
| wkB/s | Write KB per Second | The volume of data being written. |
| rrqm/s | Read Requests Merged | Linux is smart. If you ask to read two files sitting next to each other, Linux “merges” them into one request. A high number here is actually good; it means the OS is optimizing. |
| wrqm/s | Write Requests Merged | Same as above, but for writing. |
| avgqu-sz | Average Queue Size | This is the number of requests “waiting in line.” Ideally, this should be close to 0. If it is consistently over 1 or 2, your disk has a backlog. |
| await | Average Wait Time | This is the total time in milliseconds a request sits in the queue, including the time the disk takes to fix it. < 5ms: Great 10-20ms: Heavy load > 20ms: Serious Bottleneck. |
| %util | Utilization % | How much time did the disk spend working? If this hits 100%, the disk is fully saturated and cannot work any harder. |
Deeper Disk I/O Monitoring with atop
As you saw, the iostat helps confirm that the disk is the bottleneck. In this part, atop helps you go one level deeper and answer the real troubleshooting question: which process is causing the heavy disk reads and writes.
Install atop on Linux
The atop packages are available in the default Linux repositories. Depending on your OS, use the commands below to install it.
For Debian and Ubuntu, you can run:
sudo apt update
sudo apt install atop -y
For RHEL, AlmaLinux, and Rocky Linux, you can run:
sudo dnf update -y
sudo dnf install epel-release -y
sudo dnf install atop -y
The Interface and Navigation of atop Command
At this point, you can learn the key views and shortcuts, especially the disk view and sorting by disk usage.
Run the command below:
sudo atop
The screen updates every 10 seconds by default.
Once the atop command is running, you can press these keys to change the view:
- d (Disk View): This is the most essential key, which highlights disk activity columns.
- g (Generic View): Returns to the default view (CPU and Memory heavy).
- m (Memory View): Shows RAM usage details.
- D (Sort by Disk): Press Shift + d. This sorts the process list so the processes using the disk the most jump to the top.
- A (Active only): Filters out idle processes.
Understanding the atop Disk Columns
Once you press d for disk view, look at the columns on the right side of the process list icnluding:
- RDDSK: The amount of data read by this process during the interval.
- WRDSK: The amount of data written by this process.
- WCANCL: Write Cancelled. This happens when a process writes a file but deletes it before it is saved to the physical disk.
- DSK: The percentage of disk capacity this specific process is consuming.
The atop command colors include:
- Red lines: Critical resource usage, for example, Disk is 90%+ busy.
- Cyan lines: High usage but not critical.
atop Log Replay Mode
This is the advanced feature of atop, which runs in the background and saves logs. For example, if your server crashed at 3:00 AM while you were sleeping, the atop allows you to go back in time.
To replay history, open today’s log file or specify a specific date file from /var/log/atop/:
atop -r
To see exactly what the disk usage was at the exact moment the server had issues, you can use the following navigation keys:
- t: Move forward 10 minutes.
- T (Shift+t): Move backward 10 minutes.
- b: Type a specific time. For example, 14:00 to jump to 2:00 PM.
Real-World Disk I/O Scenarios and Solutions
At this point, you can use iostat and atop together in real troubleshooting by confirming the disk is the bottleneck, then identifying the exact process causing it. High await with %util near 100% usually means the storage can’t keep up, and you need to either reduce I/O or upgrade the storage layer.
Here are three common scenarios you’ll see on production servers with the fastest way to diagnose and fix each one.
Scenario A. High wait time: The website is very slow.
- Run iostat -xz 1 and check await and %util.
- If the await is very high, and the %util is close to 100%, the disk is likely saturated.
- Run atop, press d to focus on disk activity, then sort to find the top disk users.
If the top process is the database, reduce write pressure, add RAM to increase caching, and move the database to faster SSD/NVMe storage.
Scenario B. High latency and low throughput (random I/O): The %util is high, but rkB/s and wkB/s stay low.
This often points to random I/O. Lots of small reads and writes scattered across the disk, which increases latency and reduces real throughput, especially on HDDs.
You can move the workload to SSD/NVMe and reduce the many small file patterns where possible.
Scenario C. Noisy neighbor (shared storage): Your own read and write rates look low, but await is still high.
In shared environments, storage latency can rise due to other tenants consuming I/O on the same underlying hardware, even if your VM isn’t doing much.
You can ask your provider to investigate the node or migrate you, or move the workload to a dedicated server or isolated storage plan where disk performance isn’t shared.
FAQs
Does running atop or iostat slow down my server?
No. Both tools are extremely lightweight. They read statistics that the Linux kernel is already collecting in the background. Running them, even on a production server with heavy traffic, is completely safe.
Which one is better? iotop or atop?
Iotop is easier, but atop is more powerful because it records history.
How do I fix a high await time without buying new hardware?
If you can not upgrade to SSDs, try these software fixes:
– Add RAM.
– Check Logs.
– Adjust Backup Schedules.
Conclusion
Monitoring disk I/O works best as a simple two-step workflow:
- First, use iostat -xz 1 to confirm whether storage is actually the bottleneck by watching the key signals, including %util and high await time. If those numbers stay high, it means the server is stuck waiting for the disk, not just running slow code.
- Second, once you know the disk is under pressure, switch to atop to find the real cause. Press d to focus on disk activity, then press D (Shift + d) to sort processes by disk usage so the top consumer shows up immediately.
We hope you enjoy this. Subscribe to our X and Facebook channels to get the latest updates and articles.
For further reading: