Automated System Reports
The remainder of the sysstat tools automate incremental measurement of system activity and reporting. Recording activity is a concerted effort between the sysstat utilities and the system scheduler, cron.
Via cron, the utilities sa1 and sa2 perform the incremental data collection and report generation. sar and sadf are interactive utilities to view the collected data. sadc is used to manually create a short incremental snapshot.
By default, data is collected on system activity from the following systems and subsystems: CPU, memory, swap, I/O, network, paging, block devices, irqs, queues, kernel tables, processes, and TTYs.
To automate sar reporting, you can use the following cron entries, as suggested by the author of sysstat:
# 8am-7pm activity reports every 10 minutes during weekdays.
0 8-18 * * 1-5 /usr/local/lib/sa/sa1 600 6 &
# 7pm-8am activity reports every hour during weekdays.
0 19-7 * * 1-5 /usr/local/lib/sa/sa1 &
# Activity reports every hour on Saturday and Sunday.
0 * * * 0,6 /usr/local/lib/sa/sa1 &
# Daily summary prepared at 19:05
5 19 * * * /usr/local/lib/sa/sa2 -A &
cron runs crontab entries sequentially (serially) by default; adding the ampersand (&) symbol to the end of each command allows entries to be run in parallel with other entries. (If you need more information about cron entries, see Jerry Peekâ€™s February 2003 â€œPower Toolsâ€ column, â€œRunning Jobs Unattendedâ€ (http://www.linux-mag.com/content/view/1273/).
On the test system, a Fedora Core 3 system, the sysstat package creates its own cron settings under /etc/cron.d/sysstat. You may prefer the following two entries over the previous four â€” choose a set of entries and do not mix the two.
# run system activity accounting tool every 10 minutes
*/10 * * * * /usr/local/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * /usr/local/lib/sa/sa2 -A
The cron entries generate sar data. To view the reports, you can simply enter sar at the command-line. sar generates a very lengthy list of performance data collected by the sa * scripts from above.
Per either set of daily cron entries, the previous dayâ€™s data files are closed and a new one is created with the new dayâ€™s date as the naming convention. For instance, on the 25th of the month, you will see two files in /var/log/sa, one named sa25 and the other sar25. sa25 is binary data, while sar25 is the plain text data. sar is handy, since it keeps a 30-day running history for you, but you should note, however, that the oldest data files get overwritten each day.
There are switches for sar that yield statistics for memory, CPU, disk, swap, and more. Just entering sar at the prompt will yield cpu data. Refer to the man pages for complete information.
You can use sadf commands to configure sar data to conveniently be inserted into a database. This tool, unlike sar, can format the output in different ways. For example:
# sadf â€”d /var/log/sa/sa21 â€“â€“ â€“r â€“n DEV
This command above reads memory, swap space, and network statistics from the system activity file sa21 and displays them in a format (semicolon delimited) that can be ingested by a database. The option â€“A yields all system activity data for the current day.
Another â€œoldâ€ tool still in wide use is vmstat, which reports information about processes, memory, paging, block I/O, traps, and CPU activity. Fortunately, Linux vmstat doesnâ€™t count itself as a running process.
The typical way to invoke vmstat isâ€¦
# vmstat 5 5
â€¦ where the first 5 is the number of seconds between iterations and the second 5 is the number of iterations. On a relatively sleepy system, vmstat 5 5 yields Figure Four.
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 192 221616 32400 205616 0 0 10 14 1 9 2 0 98 0
0 0 192 221616 32408 205616 0 0 0 17 1004 16 0 0 99 0
0 0 192 221616 32416 205616 0 0 0 2 1004 19 0 0 100 0
0 0 192 221616 32424 205616 0 0 0 2 1003 17 0 0 100 0
0 0 192 221616 32424 205616 0 0 0 4 1004 18 0 0 100 0
vmstat is a great tool if youâ€™re having performance issues and want to get at the proper subsystem right away. It can tell you where a bottleneck is if you are experiencing problems while you are on the server. For memory-related issues, I use the swap statistics, si/so (â€œswap in/swap outâ€). It has no history or predictive ability.
free is another tool that could be considered old school. Very simply, free shows the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel. The shared memory column should be ignored; it is obsolete. The â€“m switch is to see the report in megabytes and makes the report very clean.
# free -m
total used free shared buffers cached
Mem: 503 295 221 0 33 206
-/+ buffers/cache: 56 460
Swap: 522 0 522
free is a snapshot tool with no history or predictive ability, but can be a useful to assess memory and swap issues at a glance.
Finally, thereâ€™s top. You may like top, too, because it gives me so much information in real time that youâ€™d would be hard pressed to live without. An example of top can be seen in the Figure Six.
The graphic was produced by simply issuing top at the command-line. To end top, use the â€œqâ€ key.
A handy feature of top is that you can kill a process without leaving the utility. If you see the process in the list, use the â€œkâ€ key to kill, and then type the process ID. If you have sufficient privileges, the process is terminated.
As you can see from Figure Six, top gives you a huge amount of information at a glance in real time. The only issue is that top can add stress to an already stressed out. top, quite often, is the â€œtopâ€ process and certainly has an impact on performance while it is running.
Use vmstat, free, and the sysstat tools before resorting to top, if you have a significant performance issue.
When to Cry Wolf
How do you know when you have identified a potentially serious performance issue? The answer is complex, since some systems will just normally run â€œhot.â€ For instance, a system that has a CPU usage that is constantly at 80% is not necessarily a problem. This is a normal state for that system. But you wonâ€™t know this unless you gather data for a performance baseline.
To find out what is normal for a system, you must do some preparative work and assemble a performance baseline. Run the tools in this article on a quiescent (quiet) system and redirect the output to files that you keep in a special directory for future reference. To do this, allow your system to operate normally with no users attached but with all normal processes running. Redirect the output to files such as iostat-quiet.txt and mpstat-quiet.txt. Then, allow users to connect to the system and again take a performance snapshot, redirecting the output to files iostat-loaded.txt and mpstat-loaded.txt. The latter set of files will be your performance baseline for a fully-loaded, but correctly operating system.
You now have a point of reference for any future issues that arise. Take snapshots when users begin to complain about performance and compare the statistics with your baselines. This data is priceless when trying to figure out where bottlenecks and trouble spots occur.
These tools are excellent resources for gaining insight into system performance in snapshot format but provide no predictive or long-term historical data. System administrators are well-advised to keep tabs on system performance and to keep some historical records by redirecting the output of these commands to files that are dated for later comparisons.
Kenneth Hess is a Linux fanatic and a freelance technical writer on a variety of open source topics. He can be reached via his website at http://www.kenhess.com.