Day in and day out, system administrators are stuck between a rock and a hard place. The rock? Heat from management, senior staff, users, and, well… everybody. The hard place? Hardware failures, service interruptions, virus infestations, memory hogs, runaway processes, zombie processes, and processes that inexplicably die without warning.
Of course, some failures are to be expected– and those are typically tolerated– but a failure thatâ€™s perceived as preventable can spawn an outright user revolt. The worst offense? Allowing a system to exhaust its capacity. Ironically, preventive maintenance is often quickly abandoned to deal with the emergency of the day or hour.
However, you can avert a great number of problems if you have the right tools and just a modicum of time (between crises). Hereâ€™s a handful of system utilities and a few quick techniques that should make your life easier. Yes, easier.
If you had the cash, you could literally spend tens of thousands of dollars on commercial performance monitoring tools that are absolutely state-of-the-art. But if your already-stretched-too-thin-budget canâ€™t afford another high-end solution, a number of â€œold schoolâ€ â€” command-line and text-based â€” tools can suffice. Sure, the tools canâ€™t draw pretty graphs, but the utilities can help make your system the picture of health.
The sysstat package, written and regularly updated by Sebastien Godard, is one of the â€œold schoolâ€ tools you should pick up and learn. sysstat captures system performance â€œsnapshotsâ€ that help you find, diagnose, and repair problems. Table One presents sysstatâ€™s utilities and capabilities.
||System activity reporter
||System activity daily data collector
||System activity daily data report writer
||System activity data collector
||System activity data formatter
*iostat, an acronym for â€œInput/Output Statistics,â€ reports CPU and disk/partition statistics.
*mpstat is a multi-processor CPU statistics reporting tool. However, if you only have a uniprocessor machine, it can run there as well.
*sar, the systems activity reporter, collects, displays, and saves a lengthy list of performance parameters for your system.
*sa1 and sa2 are used in cron entries to periodically collect and assimilate performance and activity information. sa1 collects and persists binary data in the daily data file. sa2 produces a summary of daily activity.
*sadc, the System Activity Data Collector command, is really a backend for sar, and is responsible for creating the binary data files.
*sadf is a utility to format your sar data in a variety of ways for more organized reporting. And finally, sar,
If you donâ€™t have sysstat, you can download the software from its home page located at http://perso.wanadoo.fr/sebastien.godard/. Once downloaded, unwrap the tarball and run:
$ make config
After a short pause, answer the list of questions that appear. If you donâ€™t know how to reply, just accept the defaults. Next, build and install the software:
$ su -
# make install
If you use RPM to install, the sysstat RPM package creates cron entries for you. If not, the cron entries required to make the activity reporting tools work are shown momentarily.
Inquiring About I/O
Letâ€™s start with iostat, which displays I/O statistics for the CPU, devices, and partitions, since the last reboot. iostat yields results similar to Figure One:
Linux 2.6.12-1.1376_FC3 (home) 08/20/2006
avg-cpu: %user %nice %system %steal %iowait %idle
0.83 1.09 0.40 0.00 0.37 97.30
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 1.25 41.70 45.76 31113603 34146672
The CPU data is a report of activity averaged across all CPUs. The output shows CPU activity information for applications (%user), applications with nice priority (%nice), kernel processes (%system), idle time when the system had an outstanding disk I/O request (%wait), and idle time when the system did not have an outstanding I/O request.
The device section reports device activity for transfers per second (tps), blocks read/written per second (Blk_read/s, Blk_wrtn/s), and the total number of blocks read/written (Blk_read, Blk_wrtn, respectively).
iostat is good, quick way to peek at CPU and block device performance/activity. %idle is a good indication of overall CPU usage, and higher values are better. However, high values in %iowait donâ€™t have the same impact in modern versions of the kernel, because processors are so powerful now that high %iowait is common because the CPU is legitimately outrunning disk I/O. The %steal parameter is a measure of involuntary wait by the CPU. A high %steal number suggests that there is contention for the CPU and a bottleneck exists.
iostat and mpstat have much in common, but mpstat shows CPU statistics only. You should run mpstat when the results of iostat bear further investigation. The mpstat command yields output like that in Figure Two.
Linux 2.6.12-1.1376_FC3 (home) 09/24/2005
02:57:51 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:57:51 PM all 0.83 1.47 0.37 0.36 0.03 0.00 96.94 1011.43
The mpstat statistics that differ from iostat are CPU time for servicing hardware interrupts (%irq), software interrupts (%soft), and the total number of interrupts per second.