Making Sense of System Performance

A number of old-fashioned tools can prevent very modern problems

Day in and day out, system administrators are stuck between a rock and a hard place. The rock? Heat from management, senior staff, users, and, well… everybody. The hard place? Hardware failures, service interruptions, virus infestations, memory hogs, runaway processes, zombie processes, and processes that inexplicably die without warning.

Of course, some failures are to be expected– and those are typically tolerated– but a failure that’s perceived as preventable can spawn an outright user revolt. The worst offense? Allowing a system to exhaust its capacity. Ironically, preventive maintenance is often quickly abandoned to deal with the emergency of the day or hour.

However, you can avert a great number of problems if you have the right tools and just a modicum of time (between crises). Here’s a handful of system utilities and a few quick techniques that should make your life easier. Yes, easier.

If you had the cash, you could literally spend tens of thousands of dollars on commercial performance monitoring tools that are absolutely state-of-the-art. But if your already-stretched-too-thin-budget can’t afford another high-end solution, a number of “old school” — command-line and text-based — tools can suffice. Sure, the tools can’t draw pretty graphs, but the utilities can help make your system the picture of health.

The sysstat package, written and regularly updated by Sebastien Godard, is one of the “old school” tools you should pick up and learn. sysstat captures system performance “snapshots” that help you find, diagnose, and repair problems. Table One presents sysstat’s utilities and capabilities.

TABLE: Sysstat Tools
Name Purpose
iostat Input/output statistics
mpstat Multi-processor statistics
sar System activity reporter
sa1 System activity daily data collector
sa2 System activity daily data report writer
sadc System activity data collector
sadf System activity data formatter

*iostat, an acronym for “Input/Output Statistics,” reports CPU and disk/partition statistics.

*mpstat is a multi-processor CPU statistics reporting tool. However, if you only have a uniprocessor machine, it can run there as well.

*sar, the systems activity reporter, collects, displays, and saves a lengthy list of performance parameters for your system.

*sa1 and sa2 are used in cron entries to periodically collect and assimilate performance and activity information. sa1 collects and persists binary data in the daily data file. sa2 produces a summary of daily activity.

*sadc, the System Activity Data Collector command, is really a backend for sar, and is responsible for creating the binary data files.

*sadf is a utility to format your sar data in a variety of ways for more organized reporting. And finally, sar,

If you don’t have sysstat, you can download the software from its home page located at http://perso.wanadoo.fr/sebastien.godard/. Once downloaded, unwrap the tarball and run:

$ make config

After a short pause, answer the list of questions that appear. If you don’t know how to reply, just accept the defaults. Next, build and install the software:

$ make
$ su -
# make install

If you use RPM to install, the sysstat RPM package creates cron entries for you. If not, the cron entries required to make the activity reporting tools work are shown momentarily.

Inquiring About I/O

Let’s start with iostat, which displays I/O statistics for the CPU, devices, and partitions, since the last reboot. iostat yields results similar to Figure One:

FIGURE ONE: Sample results from iostat
# iostat
Linux 2.6.12-1.1376_FC3 (home)  08/20/2006

avg-cpu:   %user   %nice   %system %steal  %iowait %idle
          0.83    1.09    0.40    0.00    0.37    97.30

Device:  tps     Blk_read/s      Blk_wrtn/s      Blk_read        Blk_wrtn
hda    1.25    41.70           45.76           31113603        34146672

The CPU data is a report of activity averaged across all CPUs. The output shows CPU activity information for applications (%user), applications with nice priority (%nice), kernel processes (%system), idle time when the system had an outstanding disk I/O request (%wait), and idle time when the system did not have an outstanding I/O request.

The device section reports device activity for transfers per second (tps), blocks read/written per second (Blk_read/s, Blk_wrtn/s), and the total number of blocks read/written (Blk_read, Blk_wrtn, respectively).

iostat is good, quick way to peek at CPU and block device performance/activity. %idle is a good indication of overall CPU usage, and higher values are better. However, high values in %iowait don’t have the same impact in modern versions of the kernel, because processors are so powerful now that high %iowait is common because the CPU is legitimately outrunning disk I/O. The %steal parameter is a measure of involuntary wait by the CPU. A high %steal number suggests that there is contention for the CPU and a bottleneck exists.

iostat and mpstat have much in common, but mpstat shows CPU statistics only. You should run mpstat when the results of iostat bear further investigation. The mpstat command yields output like that in Figure Two.

FIGURE TWO: mpstat shows statistics for one or more CPUs
# mpstat
Linux 2.6.12-1.1376_FC3 (home)  09/24/2005
02:57:51 PM   CPU   %user   %nice  %system  %iowait  %irq  %soft   %idle   intr/s
02:57:51 PM   all   0.83    1.47   0.37     0.36     0.03  0.00    96.94   1011.43

The mpstat statistics that differ from iostat are CPU time for servicing hardware interrupts (%irq), software interrupts (%soft), and the total number of interrupts per second.

Next: Automation

Comments are closed.