Tools for Storage Monitoring: iostat

The world of Linux storage tools for both monitoring and management just stinks. But that doesn't mean that there is absolutely nothing to help with monitoring. One such tool, iostat, can be used to watch what your storage devices are up to.

Introductory Rant

The complete lack of good tools for managing and monitoring storage has been one of my pet peeves for some time. A recent article by Henry Newman made my simmering pot of disdain come to a complete boil. As Henry eloquently points out the network world has standards allowing the development of tools that everyone can use for monitoring and managing their networks. However, as he also points out, nothing like this exists in the storage world. Consequently, we are left with minimal tools for storage administrators to use to understand what is really happening with their storage. Moreover, given today’s world of Storage Area Networks (SANs), NFS, CIFS, iSCSI, and other network storage devices and file systems, one would think that storage tools would be integrated with network tools to provide an integrated view of what is happening with the storage solution.

However, the exact opposite is true. We have no real storage tools, just a hodge-podge of utilities that “sort-of” help us. Perhaps even worse, there is no standardization between them so a tool developed for one platform or one piece of hardware, or one file system, is usually worthless elsewhere.

We, the unwashed masses, that must deal with storage on a daily basis, and this includes home users which encompasses even those who do nothing but read Facebook and check some email, have had to suffer because of the lack of good tools, the lack of people who understand who to configure systems even for home users, and the general lack of focus on storage. As I mentioned in previous articles, the amount of storage is growing at a rate that is difficult to comprehend. This growth varies from my friend’s daughter who now has over 20,000 pictures on her laptop and in Facebook, and continues on to the gene sequencing researchers who are cranking out TB’s every week for every sequence instrument with some institutions having a very large number of sequencing instruments. Does anyone else smell a problem?

Pointing out the problem, while extremely important, is just part of actually solving it. At some point the problem needs to be tackled, wrestled, and otherwise beaten into submission. What I want to do as part of this “solution” process is to discuss existing tools that can help with storage management/monitoring. This article is the first in what is likely to be a series of articles on current storage tools for Linux. This article focuses on iostat.

iostat

IOstat is a console (text based) application that allows you to check the CPU usage as well as the device or partition performance. (Previous IOstat coverage can be found here: Quick and Dirty MySQL Performance Troubleshooting and Making Sense of System Performance.) The iostat man page says the following about iostat:

iostat - Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.

IOstat is part of a bigger package of performance monitoring tools called sysstat that has tools such as,

As you can tell from the links, Sebastien Godard is the maintainer of the sysstats package. You should be able to find it for your distribution but if not, you can get it from the sysstat main page and build it yourself.

Using iostat is very simple and is much like vmstat or other “stat” tools for Linux. The basic command has a few options followed by the device you want to monitor, followed by two numbers. These two are numbers are (1) the time internal between output from iostat, and (2) the number of times iostat is to be run. If you leave the second number blank, iostat will continue indefinitely or until you hit ^c to stop it.

Here is a simple example of using iostat.

[laytonj@home8 ~]$ iostat -x -m /dev/md0 1 5
Linux 2.6.18-194.el5 (home8)    11/13/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         30.67    1.91    6.49    1.08    0.00   59.84

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00  1.89  6.03     0.02     0.02    11.23     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          7.00    0.00    2.00    0.00    0.00   91.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          7.00    0.00    2.00    0.00    0.00   91.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          7.00    0.00    4.00    0.00    0.00   89.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         12.00    0.00    5.00    0.00    0.00   83.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

In this simple example I used the following options:

  • I used the extended output option (“-x”)
  • I chose to have the output appear in megabytes (“-m”)
  • I chose to have iostat report the statistics for the device, /dev/md0
  • report the values in 1 second intervals and do that 5 times.

Let’s go over the output to understand what iostat is doing.

Comments on "Tools for Storage Monitoring: iostat"

pascoff

Hi there,

one real problem I see with tools for storage monitoring is lack of them for finding the real “abuser” in your system when you have a IO bottleneck. For example, with iostat you can see the whole performance goes to hell on a busy system but you can not see which exactly process for example is doing some nasty IO. Of course there are some other tools like atop that can pin down the real “abuser” but again there are some limitations like you need a recent kernel version with some patches to be able to use this utility.

I wonder, as the author does, how come there are no good and proven tools for IO monitoring like ps, top, etc. for process monitoring …

Reply
laytonjb

@pascoff,

Sounds like we belong to the same tribe – the tribe of “we need better tools!”.

While I’m not a “coder” in the literal sense of the word (all my coding was around my research area with is aeronautics and astronautics), I have started down the path of trying to put something together like you mentioned – finding the “abuser”. I think with a combination of SystemTap and some other things, a simple tool can be created to monitor the system and when certain conditions happen, such as a high load, the tool can find the process that is driving the problems. I’m just beginning some research into what the pieces that go into the tool and how they get glued together.

I’m not sure what to do once you identify the rogue process. I thought about just killing the rogue process but that’s way to draconian. I also thought about trying to throttle the rogue process so that the load (or whatever the measurement is), doesn’t go crazy. But I think for a while the default will be just to inform the admin of the problem and let them determine the course of action.

But this tool will be slow in coming since my day job interferes with real work time.

Thanks for the post and comments!

Jeff

Reply
mossholderm

@pascoff , One tool that helps is iotop.

http://guichaz.free.fr/iotop/
=====
iotop does for I/O usage what top(1) does for CPU usage. It watches I/O usage information output by the Linux kernel and displays a table of current I/O usage by processes on the system. It is handy for answering the question “Why is the disk churning so much?”.
=====

Reply
wcorey

I think the real challenge is not so much expressing the raw data but expressing information. In other words, depicting when a channel is saturated, perhaps when a drive is failing, when the queue depth is outside of normal and service time is outside of normal and io volume in req/sec and bytes/sec are outside of normal. Implicit in this is for the tool/system to determine what is ‘normal’. To put this simply, we are awash with data but starved for information.

Reply
laytonjb

@wcorey,

Great comment and I agree with you. Determining what is “normal” is one thing but we must also determine what is “acceptable”. In addition, I think we can also include what is “maximum”. So is normal some percentage of the maximum? Can that normal fluctuate?

I personally think we also need to focus on what is acceptable. For example, is it OK for an NFS server to have a very high peak load where the system apparently freezes for a few seconds but then returns to “normal”. This behavior might be acceptable. That is, going beyond the normal levels for a small period of time. But then again, if it happens too often it may not be acceptable.

Sounds like a machine learning or AI kind of problem to me :) Maybe this is a way for AI to make a come-back! :)

Again – thanks for the insightful comments.

Jeff

Reply
shankar_k_e

I feel that you can hook onto SystemTap Api’s to figure out whats going on in the system or to use iotop to figure out which user is hogging the machine.

Reply
kappaj

Part of my gripe is the “unknown” when starting out.
I have Sun X4100 boxes with 2 CPU (dual core) giving me 4 cpu’s with total of 8Gig memory. 4 disk in raid 0-1 config.
During heavy loads I have major MySQL transaction issues.
Looking at SAR and IOStat, I see %iowait values between 15 and 23.
Now is that good or bad – how do I determine what a good/acceptable/bad value for %iowait is…
Anybody with some pointers?

Reply

Well, this can become a good basic course on parallel programming if the author continues and goes deeper talking about real issues that makes.traveling-inn

Reply

Linux Sys Admin jobs for almost the entire last 8 years. Mostly, thses have beeen as thin on the ground as thin ice, and now there is absolutely nothing. So yes, I too am having a VERY tough time. adventureeducationinc
backtoschoolkarate
biggereducation

Reply

This web site really has all of the information and facts I wanted concerning this subject and didn’t know who to ask.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>