Tools for Storage Monitoring: iostat

The world of Linux storage tools for both monitoring and management just stinks. But that doesn't mean that there is absolutely nothing to help with monitoring. One such tool, iostat, can be used to watch what your storage devices are up to.

In general you get two type of reports with iostat (you can use options to get one report or the other but the default is both types of reports). The first report has CPU usage and the second report is the device utilization report. The CPU report contains the following information:

  • %user: Shows the percentage of CPU utilization that occurred while executing at the user level (this is the application usage).
  • %nice: Shows the percentage of CPU utilization that occurred while executing at the user level with “nice” priority.
  • %system: Shows the percentage of CPU utilization that occurred while executing at the system level (kernel).
  • %iowait: Shows the percentage of time that the CPU or CPU’s were idle during which the system had an outstanding disk I/O request.
  • %steal: Shows the percentage of time spent in involuntary wait by the virtual CPU or CPU’s while the hypervisor was servicing another virtual processor.
  • %idle: Shows the percentage of time that the CPU or CPU’s were idle and the systems did not have an outstanding disk I/O request.

Some of these values should be fairly familiar to you. The values are computed as system-wide averages for all processors when your system has more than 1 core (which is pretty much everything today).

The second report is all about the device utilization. It prints out all kinds of details about the device utilization (can be a physical device or a partition). If you don’t use a device on the command line, then iostat will print out values for all devices (alternatively you can use “ALL” as the device). Typically the report output includes the following:

  • Device: Device name
  • rrqm/s: The number of read requests merged per second that were queued to the device
  • wrqm/s: The number of write requests merged per second that were queued to the device
  • r/s: The number of read requests that were issued to the device per second.
  • w/s: The number of write requests that were issued to the device per second.
  • rMB/s: The number of megabytes read from the device per second.
  • wMB/s: The number of megabytes written to the device per second.
  • avgrq-sz: The average size (in sectors) of the requests that were issued to the device.
  • avgqu-sz: The average queue length of the requests that were issued to the device.
  • await: The average time (milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
  • svctm: The average service time (milliseconds) for I/O requests that were issued to the device.
  • %util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this values is close to 100%.

These fields are specific to the set of options used as well as the device.

It is also important to remember that the first report generated by iostat provides values (statistics) using the time interval since the system was booted. All subsequent reports use the time interval that you specify.

If you go back to the example you will see that the first report (the combination of the first CPU report and first device report) have non-zero values. Remember that this first report computes values based on the system since it first booted. The reports after the first one show little if any activity on /dev/md0 but you do notice some differences in the CPU reporting.

Let’s take a look at more interesting example – running iostat while the system is running iozone.

IOstat Example with IOzone

The first example was pretty boring with nothing happening. Let’s try using iostat when running iozone to show something more interesting (helps with understanding iostat).

Before starting iozone I started iostat with the following command:

[laytonjb@test64 IOSTAT]$ iostat -m -x /dev/sdd 1

The options I used are:

  • I’m using extended output (“-x”) to get more output
  • I want the output to be in megabytes (“-m”) rather than blocks
  • I’m examining the device, /dev/sdd which is an Intel X25-E SSD (it rocks)
  • Finally the last input “1″ tells iostat that I want a report every second and I want it to go on indefinitely until I interrupt it (the indefinitely comes from the fact that I didn’t give it a second input after the “1″).

At first the output is pretty boring since I haven’t started iozone.

Linux 2.6.35-rc4+ (test64) 	11/13/201

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.63    0.13    0.39    9.20    0.00   88.65

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.44     0.06  0.14  0.09     0.00     0.00    25.98     0.00    0.26   0.21   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.25    0.00    0.25    0.00    0.00   98.50

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.25    0.00    0.00   99.50

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Notice that the read throughput (“rMB/s”) is zero as is the write performance (“wMB/s”). There are also no read requests (“r/s”) nor write requests (“w/s”). Finally the CPU load is very load (less than 2%) and there are iowaits happening.

The output below shows what happens when iozone is started.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.25    0.00    0.00   99.50

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.00    0.00    0.75   16.46    0.00   81.80

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00  4.00  1.00     0.27     0.00   110.40     0.01    1.20   1.00   0.50

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.50   24.56    0.00   74.94

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00   13.47   11.72    0.00   74.31

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00    12.00  0.00 13.00     0.00     0.10    15.38     0.01    0.69   0.23   0.30

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00   26.25   18.50    0.00   54.50

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 55628.00  0.00 301.00     0.00   144.06   980.17    92.88  234.15   2.31  69.50

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00   10.25   40.25    0.00   49.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 50282.00  0.00 392.00     0.00   195.96  1023.80   135.48  345.05   2.55 100.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.00    0.00    6.73   43.14    0.00   49.13

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 47117.00  0.00 390.00     0.00   195.00  1024.00   141.50  350.19   2.57 100.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.25    0.00   10.00   40.25    0.00   48.50

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 54991.00  0.00 418.00     0.00   209.00  1024.00   135.03  339.14   2.39 100.00

At first the CPU usage is very low and there are no iowaits. Then the second report shows some iowaits (16.46%) which keeps increasing. You can also see the system CPU utilization increasing steadily (this is a 4-core AMD Opteron system). Then part way through the output you see the write throughput (“wMB/s”) start at 0.10 MB/s and then kick up to 144.06 MB/s and it continues increasing. During this time, the read throughput is zero. This shows the beginning of the first iozone test which is a write throughput test. You also see the number of write requests (“w/s”) increasing greatly when the write throughput increases. The same is true for the average size (in sectors) of the requests to the device (“avgrq-sz”) and the average queue length for /dev/sdd (“avgqu-sz”).

One interesting statistic that I like to examine is “%util”. This will tell me if the underlying device is approaching saturation or 100% utilization. In the case of the write throughput test the %util stays low at first then quickly reaches 100%, indicating, you guessed it, that we have saturated the device.

We can go forward through the iostat output to find the same kind of output for reading.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    3.74   46.38    0.00   49.88

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 52308.91  0.00 416.83     0.00   208.42  1024.00   141.10  338.68   2.38  99.11

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.25   41.00    0.00   56.75

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00 27560.00  0.00 341.00     0.00   170.01  1021.04   122.06  407.75   2.93 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.50   24.25    0.00   75.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00 230.00 18.00    28.23     7.50   295.10     2.22   38.24   4.02  99.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.00   23.25    0.00   74.75

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00 766.00  0.00    95.75     0.00   256.00     1.95    2.55   1.31 100.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    5.75   20.00    0.00   74.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00 1967.00  0.00   245.75     0.00   255.87     1.86    0.95   0.51 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    4.98   20.90    0.00   73.88

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdd               0.00     0.00 1976.00  0.00   246.88     0.00   255.87     1.88    0.95   0.51 100.20

Notice the transition from write throughput (“wMB/s”) to read throughput (“rMB/s”). What is also interesting is the percentage of iowaits goes down once the read throughput test has started. I believe this is because SSD’s have amazing read performance. Also notice that the average queue length for /dev/sdd (“avgqu-sz”) goes down when switching over to read testing. Again, I believe this is because of the amazing read performance of SSD’s. However, the %util for the device is 100% during the read testing.

There are many ways you can use iostat for monitoring your storage server. For example, a simple way is that if your storage server load starts going up, you can use iostat to look for the offending device. This can be done by using the keyword “ALL” in place of the specific device so that you can find the offending device or devices. You can have iostat generate a report every second for a few minutes and see which device (or devices) has the problem. During this time you should examine the throughput as well as the queue information. Plus I like to examine the “%iowait” output to see if the system has a large back log of I/O operations waiting. I can couple this with the information in %util to determine if the device is saturated and how much data is in the queue.

Summary

I think it is fairly obvious that there is a severe lack of good good, integrated tools for managing and monitoring storage in Linux. As misery loves company, this is true of storage in general. Given that data is growing extremely fast, one would think that it would be logical for there to be a good set of tools for storage, but that obviously isn’t the case.

While I am ranting about the horrible state of storage tools, I also recognize that there are some tools out there that are pretty much the “fingers in the dike” for storage management/monitoring and we must at least understand how to use them.

IOstat is one of the tools that comes in the sysstat package. It can provide some useful information about the state of storage servers both from a CPU perspective and an underlying storage device perspective. It can give you a reasonable amount of data including the number of I/O operations and throughput. It also can give you information on the state of the I/O queues which helps you understand if the there is a great deal of data remaining to be serviced or if the queues are lightly loaded indicating that the storage devices are keeping up with the workload.

So when your storage servers are greatly loaded and/or storage performance is suffering, iostat can help you begin to uncover what is happening with your system.

Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).

Comments on "Tools for Storage Monitoring: iostat"

pascoff

Hi there,

one real problem I see with tools for storage monitoring is lack of them for finding the real “abuser” in your system when you have a IO bottleneck. For example, with iostat you can see the whole performance goes to hell on a busy system but you can not see which exactly process for example is doing some nasty IO. Of course there are some other tools like atop that can pin down the real “abuser” but again there are some limitations like you need a recent kernel version with some patches to be able to use this utility.

I wonder, as the author does, how come there are no good and proven tools for IO monitoring like ps, top, etc. for process monitoring …

Reply
laytonjb

@pascoff,

Sounds like we belong to the same tribe – the tribe of “we need better tools!”.

While I’m not a “coder” in the literal sense of the word (all my coding was around my research area with is aeronautics and astronautics), I have started down the path of trying to put something together like you mentioned – finding the “abuser”. I think with a combination of SystemTap and some other things, a simple tool can be created to monitor the system and when certain conditions happen, such as a high load, the tool can find the process that is driving the problems. I’m just beginning some research into what the pieces that go into the tool and how they get glued together.

I’m not sure what to do once you identify the rogue process. I thought about just killing the rogue process but that’s way to draconian. I also thought about trying to throttle the rogue process so that the load (or whatever the measurement is), doesn’t go crazy. But I think for a while the default will be just to inform the admin of the problem and let them determine the course of action.

But this tool will be slow in coming since my day job interferes with real work time.

Thanks for the post and comments!

Jeff

Reply
mossholderm

@pascoff , One tool that helps is iotop.

http://guichaz.free.fr/iotop/
=====
iotop does for I/O usage what top(1) does for CPU usage. It watches I/O usage information output by the Linux kernel and displays a table of current I/O usage by processes on the system. It is handy for answering the question “Why is the disk churning so much?”.
=====

Reply
wcorey

I think the real challenge is not so much expressing the raw data but expressing information. In other words, depicting when a channel is saturated, perhaps when a drive is failing, when the queue depth is outside of normal and service time is outside of normal and io volume in req/sec and bytes/sec are outside of normal. Implicit in this is for the tool/system to determine what is ‘normal’. To put this simply, we are awash with data but starved for information.

Reply
laytonjb

@wcorey,

Great comment and I agree with you. Determining what is “normal” is one thing but we must also determine what is “acceptable”. In addition, I think we can also include what is “maximum”. So is normal some percentage of the maximum? Can that normal fluctuate?

I personally think we also need to focus on what is acceptable. For example, is it OK for an NFS server to have a very high peak load where the system apparently freezes for a few seconds but then returns to “normal”. This behavior might be acceptable. That is, going beyond the normal levels for a small period of time. But then again, if it happens too often it may not be acceptable.

Sounds like a machine learning or AI kind of problem to me :) Maybe this is a way for AI to make a come-back! :)

Again – thanks for the insightful comments.

Jeff

Reply
shankar_k_e

I feel that you can hook onto SystemTap Api’s to figure out whats going on in the system or to use iotop to figure out which user is hogging the machine.

Reply
kappaj

Part of my gripe is the “unknown” when starting out.
I have Sun X4100 boxes with 2 CPU (dual core) giving me 4 cpu’s with total of 8Gig memory. 4 disk in raid 0-1 config.
During heavy loads I have major MySQL transaction issues.
Looking at SAR and IOStat, I see %iowait values between 15 and 23.
Now is that good or bad – how do I determine what a good/acceptable/bad value for %iowait is…
Anybody with some pointers?

Reply

Well, this can become a good basic course on parallel programming if the author continues and goes deeper talking about real issues that makes.traveling-inn

Reply

Linux Sys Admin jobs for almost the entire last 8 years. Mostly, thses have beeen as thin on the ground as thin ice, and now there is absolutely nothing. So yes, I too am having a VERY tough time. adventureeducationinc
backtoschoolkarate
biggereducation

Reply

Leave a Reply to kappaj Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>