Tools for Storage Monitoring: nfsiostat

In our never ending-quest for reasonable storage management and monitoring tools, we examine a simple tool in the sysstat collection: nfsiostat. Coupled with iostat, the combination creates a nice set of tools for monitoring NFS.

I believe I have ranted enough about the abject state of storage management and monitoring tools in Linux so I don’t think I need to go down that road again. Instead, we should focus on tools that we do have available and how we can use them while we wait for better tools to be developed (or while we are developing them).

In a previous article, I covered iostat, which is a tool in the sysstat family of monitoring tools. It allows you to monitor the performance of a partition or device as well as CPU usage on a system. In particular it produces the following metrics:


  • %user: The percentage of CPU utilization that occurred while executing at the user level (this is the application usage).
  • %nice: The percentage of CPU utilization that occurred while executing at the user level with “nice” priority.
  • %system: The percentage of CPU utilization that occurred while executing at the system level (kernel).
  • %iowait: The percentage of time that the CPU or CPU’s were idle during which the system had an outstanding disk I/O request.
  • %steal: The percentage of time spent in involuntary wait by the virtual CPU or CPU’s while the hypervisor was servicing another virtual processor.
  • %idle: The percentage of time that the CPU or CPU’s were idle and the systems did not have an outstanding disk I/O request.
  • rrqm/s: The number of read requests merged per second that were queued to the device.
  • wrqm/s: The number of write requests merged per second that were queued to the device.
  • r/s: The number of read requests that were issued to the device per second.
  • w/s: The number of write requests that were issued to the device per second.
  • rMB/s: The number of megabytes read from the device per second.
  • wMB/s: The number of megabytes written to the device per second.
  • avgrq-sz: The average size (in sectors) of the requests that were issued to the device.
  • avgqu-sz: The average queue length of the requests that were issued to the device.
  • await: The average time (milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
  • svctm: The average service time (milliseconds) for I/O requests that were issued to the device.
  • %util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

As you can tell, there are a wide range of metrics that can be monitored with iostat. Plus it also monitors the CPU usages which can also be of interest.

If you are using a system as an NFS server, iostat allows you to monitor what is happening on the server pretty easily. But what happens if you see a big increase in the load on the NFS server? Iostat can detect it on the server, but more than likely it has to be coming from an NFS client. This means that we also need to monitor the clients as well. More specifically, we need to monitor what is happening with NFS mounted file systems on the client. Fortunately, the sysstat family has a tool similar to iostat that can help with this.

nfsiostat

The sysstat family includes a utility called nfsiostat, that resembles iostat, but allows you to monitor the read and write usage on NFS mounted file systems. It’s used in a very similar manner to iostat. The basic command has a few options followed by two numbers. These two are numbers are (1) the time internal between output from nfsiostat, and (2) the number of times nfsiostat is to be run. If you leave the second number blank, nfsiostat will continue indefinitely or until you hit ^c to stop it.

Here is a simple example of using nfsiostat running on an NFS client.

[root@home8 etc]# /usr/local/bin/nfsiostat -k 1 Linux 2.6.18-194.el5 (home8) 12/04/2010 _i686_ (1 CPU) Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 3400.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


In this simple example I used the following two options:


  • I chose to have the output appear in kilobytes (“-k”) although you could have omitted this option and gotten the output in blocks. Alternatively, you could use “-m” to get the output in megabytes.
  • Report the values in 1 second intervals and continue indefinitely.
Let’s go over the output to understand what nfsiostat is doing.

Nfsiostat gives you a number of outputs that appear a little cryptic but are actually pretty easy to follow and are very similar to iostat. The output consists of several columns:


  • Filesystem: Name of the NFS file system mounted. Typically this is the NFS server name followed by the mount point on the NFS client.
  • rBlk_nor/s (rkB_nor/s, rMB_nor): This output is the number of blocks (kilobytes, megabytes) read by applications using the NFS mounted file system using the read(2) system call. Remember that a block has 512 bytes.
  • wBlk_nor/s (wkB_nor/s, wMB_nor/s): This output is the number of blocks (kilobytes, megabytes) written by applications using the NFS mounted file system using the write(2) system call. Remember that a block has 512 bytes.
  • rBlk_dir/s (rkB_dir/s, rMB_dir/s): This column lists the number of blocks (kilobytes, megabytes) read from the files that have been opened with the O_DIRECT flag.
  • wBlk_dir/s (wkB_dir/s, wMB_dir/s): This column lists the number of blocks (kilobytes, megabytes) written to the files that have been opened with the O_DIRECT flag.
  • rBlk_svr/s (rkB_svr/s, rMB_svr/s): This column lists the number blocks (kilobytes, megabytes) read from the NFS server by the NFS client via an NFS READ request.
  • wBlk_svr/s (wkB_svr/s, wMB_svr/s): This column lists the number blocks (kilobytes, megabytes) written to the NFS server by the NFS client via an NFS WRITE request.
  • ops/s: This column lists the number of operations that were issued to the file system in operations per second.
  • rops/s: This column lists the number of read operations that were issued to the file system in operations per second.
  • wops/s: This column lists the number of write operations that were issued to the file system in operations per second.

As with iostat, the first report generated by nfsiostat provides values (statistics) using the time interval since the system was booted. All subsequent reports use the time interval that you specify. Basically you ignore the first line of output and watch the subsequent lines of output.

The above example was uninteresting since it just consisted of zeros. So let’s take a look at something more interesting, running IOzone over NFS.

nfsiostat Example with IOzone

To get non-zero output from nfsiostat, iozone was run on an NFS client using an NFS mounted file system. The details of the configuration are not that important since we’re interested in seeing the output from nfsiostat on the NFS client.

Before starting iozone I started nfsiostat with the following command:

[laytonjb@home8 laytonj]$ nfsiostat -k 1


The options I used are very simple:


  • I want the output to be in kilobytes (“-k”) rather than blocks
  • The last input “1″ tells nfsiostat that I want a report every second and I want it to go on indefinitely until I interrupt it (the “indefinitely” term comes from the fact that I didn’t give it a second input after the “1″).

At first the output is pretty boring since I haven’t started iozone as shown below (the first line of output has the values since booting).

[laytonj@home8 laytonj]$ nfsiostat -k 1 Linux 2.6.18-194.el5 (home8) 12/04/2010 _i686_ (1 CPU) Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 31077.64 6615356.45 0.00 0.00 29558.11 6614815.04 56400.00 600.00 20100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


Remember you want to look at the second line of output since the first line is the output since the system booted. Notice that everything is zero meaning that there is no active usage of the NFS mounted file system on the client.

The nfsiostat output below shows what happens once iozone started running on the client.

... Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 82.62 9830530.47 0.00 0.00 0.00 51328.12 15500.00 0.00 12900.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 13107200.00 0.00 0.00 0.00 870530.47 13700.00 0.00 13700.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 4915200.00 0.00 0.00 0.00 1126400.00 1200.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 24576000.00 0.00 0.00 0.00 19616400.00 21300.00 0.00 21100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1200.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1228800.00 1100.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1228800.00 1200.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1228800.00 1200.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1200.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1228800.00 1100.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1152800.00 1300.00 0.00 1300.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1105200.00 1400.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1208400.00 1100.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1136000.00 1300.00 0.00 1300.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 968400.00 2800.00 0.00 1200.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 11468800.00 0.00 0.00 0.00 229600.00 29800.00 0.00 4100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 14745600.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 3276800.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1100.00 0.00 1100.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 1126400.00 1200.00 0.00 1100.00 ...


Notice that there is a considerable flurry of activity at first that is write dominated. This is easily seen by watching the columns under “wkB_nor/s” and “wkB_svr/s” (number of kilobytes written by applications and number of kilobytes written to the server respectively). You can also see this in the column labeled “ops/s” which is the total number of I/O operations per second, and the column labeled “wops/s” which is the number of write operations per second. Seeing non-zero values makes sense since the first test run in our iozone run is a sequential write test.

Monitoring NFS with iostat and nfsiostat

The whole point in using iostat and nfsiostat is to monitor what’s going on with our storage. In the case of NFS, iostat allows us to monitor the performance of the NFS server down to the individual disks. We use nfsiostat to monitor what’s going on with the NFS clients. Tying everything together is fairly easy but is still a manual process.

You can easily run iostat on the NFS to monitor the storage devices. For example you can run iostat on your NFS server to see all of the devices and partitions available on the system. For example,

[root@test64 ~]# iostat -hxm 5 Linux 2.6.30 (test64) 12/04/2010 avg-cpu: %user %nice %system %iowait %steal %idle 0.29 0.02 0.62 0.93 0.00 98.14 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util hda 3.40 0.89 4.94 0.98 0.19 0.01 67.06 0.27 45.55 4.44 2.63 hda1 0.60 0.00 0.02 0.00 0.00 0.00 47.63 0.00 10.30 9.35 0.02 hda2 0.09 0.00 0.01 0.00 0.00 0.00 67.29 0.00 10.35 9.97 0.01 hda3 2.69 0.89 4.90 0.98 0.19 0.01 67.12 0.27 45.74 4.45 2.62 sda 0.30 0.00 0.03 0.00 0.00 0.00 29.16 0.00 3.62 3.34 0.01 sda1 0.29 0.00 0.02 0.00 0.00 0.00 36.14 0.00 4.43 3.96 0.01 sdb 0.25 0.00 0.02 0.00 0.00 0.00 33.88 0.00 4.44 4.42 0.01 sdb1 0.23 0.00 0.01 0.00 0.00 0.00 54.96 0.00 3.52 3.52 0.00 sdc 0.07 0.00 0.02 0.00 0.00 0.00 33.07 0.00 5.00 5.00 0.01 sdd 0.09 247.52 0.04 2.97 0.00 0.98 666.97 0.16 52.45 1.79 0.54 sdd1 0.07 247.52 0.03 2.80 0.00 0.98 708.08 0.16 55.69 1.90 0.54


where you can see all of the specific partitions and the associated performance (in this case, since booting). You can narrow this down to the specific devices/partitions that are being used for NFS export to make reading the output a little easier.

When you see the load on a NFS exported partition on the server climb to a very high number, you can then use nfsiostat on the clients to find the offending NFS client (you may already know). You can easily use “ssh” to examine the NFS load on the clients, and find the client that seems to be the offending client (i.e. watch for larger read or write values). For example, we could do the following:

[root@test64 ~]# ssh laytonj@192.168.1.8 /usr/local/bin/nfsiostat -k 1 laytonj@192.168.1.8's password: Linux 2.6.18-194.el5 (home8) 12/04/2010 _i686_ (1 CPU) Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 32133.69 1320613030.27 0.00 0.00 104036.72 1320612745.12 2500500.00 3200.00 1489400.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Filesystem: rkB_nor/s wkB_nor/s rkB_dir/s wkB_dir/s rkB_svr/s wkB_svr/s ops/s rops/s wops/s 192.168.1.65:/mnt/home1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


In this case there is nothing to see, but the process is the same (as long as you can ssh into the client). The down side is that if you have a large number of clients you will have to open a large number of ssh sessions to find the offending client.

The combination of iostat and nfsiostat can help monitor NFS storage systems. While still a somewhat manual process they give you some insight into the performance of both the server and the client and can allow you to track down offending clients that may be pushing the NFS serve a bit too hard (e.g. downloading Mili Vanilli mp3′s again).

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62