dcsimg

Customizing top for a Shared Web Server

My Web server for http://perltraining.stonehenge.com is on a nicely configured shared Linux box at a 24x7-manned co-location facility. While I'm not really system administrator for this box, I still want to be sure that my Web things aren't bogging the system down unnecessarily. (If that happens, the other e-commerce users will start rallying to kick me off.) This is especially true as I experiment more with dynamically generated pages and toys for columns like this one.

My Web server for http://perltraining.stonehenge.com is on a nicely configured shared Linux box at a 24×7-manned co-location facility. While I’m not really system administrator for this box, I still want to be sure that my Web things aren’t bogging the system down unnecessarily. (If that happens, the other e-commerce users will start rallying to kick me off.) This is especially true as I experiment more with dynamically generated pages and toys for columns like this one.

So, the other day, I found myself invoking the Linux standard top program, taking stabs at configuring it to watch my Web server. But since I’m not the only httpd-something on the machine, I kept seeing other Web servers there, and it messed up my view. Also, I couldn’t tell if the child CGI scripts were expensive or cheap, since they would show up as some other process in the display.

I thought to myself that it’d be nice to have a Perl program that does just what I want top to really do: get the information about the processes that make up my Apache server, and show how CPU-bound and page-fault-bound they are, including any child CGI processes that got launched. To do this, I’d need:

* A way to draw things on the screen repeatedly with minimal refresh. No problem: Perl has the Curses.pm module to do this!

* Some access to the Web server so that I can get the process IDs associated with this Web server. Again, Apache has a mod_status module that can tell me this information.

* A way of getting top or ps information about CPU time and page faults. Upon a little investigation, I found the /proc filesystem, which provided everything I needed, and with extraordinary privileges required for my program!

So, I decided to use Perl as (once again) the “duct tape of the Internet,” gluing together three separate domains: my xterm window, the Web-server information, and the system-resource values. Even though all three of these reside on a single machine, I use a networking protocol, HTTP, to tie them together. This gives me the power of standard facilities most often used across a network. It took only a little time to throw together the little program in Listing One. Now, mind you, this is merely a proof-of-concept, dashed off in a couple of hours to solve a specific task. You can see, though, how easy it is to use Perl to glue things together to make entirely new system-administration tools rather easily.

The program goes as follows:

Lines 1 through 3 provide the standard header for most of my programs, where I enable warnings, turn on compiler restrictions, and unbuffer standard output.

Line 5 pulls in the LWP::Simple module, part of the LWP library found in the CPAN (located at http://www.cpan.org, among other places). We’ll be using the get routine from this module to fetch a given URL for its contents.

Line 6 pulls in the Curses module, also found in the CPAN.

This module provides Perl access to the Curses library, letting me draw screens with minimal refresh on updates. As the Curses module is a little more persnickety about installation, issue the maketest before the make install to insure that everything is okay.

Lines 8 and 9 provide the only configuration constants for this program. Line 8 needs to map to the mod_status trigger URL for my Web server (which I’ve deliberately mangled here, since I don’t want you-all to be pinging my server). This URL needs to be specifically enabled, as discussed at http://www.apache.org/docs/mod/mod_status. html. This also means that your server has to have mod_ status compiled in. This script refreshes its display on a regular schedule, polling its data source every $SLEEPTIME seconds for new information. Line 9 sets this waiting time to 10 seconds.

Lines 11 to 17 give names to the /proc/nnn/stat white- space-separated fields. Most of the names and definitions came from the proc(5)man page, but I had to get a few by scruffing through the kernel source /usr/src/linux/fs/proc/ array.c, which seems to be more up-to-date than the man page (fancy that). For the descriptions of these fields, see the corresponding man page or source.

Lines 19 to 22 define how to show these fields. $SHOWLABELF gives a printf-acceptable field-width definition. $SHOWLABEL generates the column headers on the first line of the screen, and @SHOWFIELDS defines the fields from which the data comes. Note that most of these names are names that match @FIELDS earlier, but a few are uppercase. Those fields are computed fields, some from the Web-server status (like STATUS), and some from other fields (like CPU and PCPU).

Lines 25 to 30 fetch the boot time of this machine. We need this value because the start time of a process is defined in terms of “jiffies” (0.01 seconds each) after the boot time of the machine. To do this, we’ll scruff through the contents of the virtual file called /proc/stat looking for the btime nnnnnentry.

Line 32 declares the %cpu_history variable, keeping track of the prior total CPU usage for each process so that we can determine how much new CPU has been used, and therefore what percentage of total CPU a process has used.

Lines 34 to 90 form the main display loop of the program. The initscr call in line 34 comes from the Curses library and sets up all the screen-related parameters, as well as erases the screen. Lines 35 to 89 create an infinite loop: Note the redo in line 88.

Lines 36 to 39 fetch the Web-server status. I’m passing in a ?notable query parameter attached to the server-status URL. This causes mod_status to spit the data out in a slightly more parseable format, without a lot of table tags. Line 38 deletes the output up to the individual server details, while line 39 tosses everything after the hr tag. Note that I needed an s and m and i suffix on that substitution, and I realized that I could throw in a harmless o option, so that I could spell the word osmosis. Silly.

Lines 41 and 42 declare two hashes that get cleared out on each iteration. The %info hash holds the information about each process, keyed by process ID (PID) number. The %cpu hash is also keyed by PID number and holds the total CPU usage, as well as the timestamp at which the usage was taken (for percentage calculations).

Lines 44 and 45 figure out how long it’s been since the previous round of this loop. Of course, it’ll be close to $SLEEPTIME, but rather than count on that, we can figure the numbers out accurately. Line 44 saves the current Unix time into $cpu{TIME}, while line 46 computes the difference between this value and the historical value saved on the previous pass. If this number is 0, no percentages are displayed (as noted later).

Line 47 takes the contents returned from the Web server and extracts all the interesting PIDs and their respective Web-server status, as a hash in %http_status. The ugly regular expression is needed because the server returns:

Ready

for a ready process, but a bolded:


 DNS Lookup

for anything that’s not just Ready. The bold tags would get in the way, so I sniff around inside of them.

Lines 49 to 69 fetch information about each process from the /proc filesystem. The PIDs of interest are the keys of %http_status, and we transform each of them into the right filename in line 49.

The /proc file system indexes all information by process ID. Thus, for example, the status for process 123 appears in pseudo-file /proc/123/stat. Lines 50 to 53 fetch the contents of that file into $_. Note that if we can’t get to a file, we simply skip it, presuming that the process may have gone away between the time we looked at Apache and the time we are looking at the /proc entry. No big deal to have lost this.

Line 55 and 56 establish a %fields variable to hold all of the whitespace-delimited values for a given process. The @rest variable is an artifact of debugging; if there’s anything in there, it means my @FIELDS list is wrong, as it originally was when I was looking only at the man page and not the kernel source.

Lines 57 and 58 extract the PID and store the data into the %info hash keyed by that PID. Line 59 creates the first non-/proc field for this PID as the Web-server status.

First, lines 60 and 61 add together the time consumed by both user and system for both the process and its waited-for children.

This sum, measured in jiffies, goes into $cpu. Next, this value is stored into both the %cpu hash and the %info hash keyed by the PID, in line 62. Finally, lines 63 through 68 compute a percentage (with one digit after the decimal point) if possible, and set up the PCPU pseudo-field to hold that information.

Once we’ve processed all the PIDs on this round, line 70 copies the CPU information into %cpu_history to serve as a baseline for the next iteration. If you had any other variables that needed a relative increment rather than an absolute value, you could save them likewise in this part of the program

Finally, lines 72 to 86 generate the screen, now that we have all the data. Line 72 is a Curses call to “erase” the screen. This doesn’t really send out any characters, but it marks the in-memory version of the screen as all blank. There won’t be any output until line 86, where Curses will compare the resulting in-memory version with the screen view of what was sent last time, and update just those portions that have changed.

Line 73 puts the label at the top of the display, using another Curses routine. As this is a constant string, I could have avoided adding it each time by erasing on the portion of the screen below the first line, but I didn’t care to go through all that work. The 0,0 value here is the upper left corner of the screen, indexed by rows and then columns.

Lines 74 to 84 dump out each of the rows of information, keeping track of each line’s row via the $row variable. The sort expression in lines 76 to 78 selects PIDs in their order by start time, with the PID itself being the tie-breaker if two start times are identical.

Lines 79 to 82 dump out the information for a particular process. The first element of @SHOWFIELDS is sent through time_convert (I’ll get into this later), while the remaining elements are passed as-is to the sprintf operator.

Line 85 moves the virtual cursor to the upper left corner. Line 86 sends the changed characters out to the real terminal, including moving the real cursor to the upper left corner as well. Line 87 sleeps for 10 seconds before starting the whole process all over again.

The endwin in line 90 is never reached. However, you could invoke Curses routines to see if a key was pressed during a sleep period, and use that to exit the program or change some configuration parameters, just like top. In my case, I was done when I was able to hit CTRL-C at the right time to get me out, which this program does.

Lines 92 to 101 convert a starttime value into a human-readable string. For starters, the start time is measured as jiffies past boot time, so we figure out a Unix timestamp value in $when in line 94. Then, if the timestamp is more than 12 hours ago, we’ll use the date, otherwise the hour, minute, and second. Both are derived by looking at the scalar return value of localtime, which generates a nice readable string.

And there you have it. You can see this “proof of concept” program in action if you build and install LWP and Curses from the CPAN, and then ensure that you have a URL that can trigger Apache’s mod_status URL in the $STATUS variable. Of course, the fun really begins when you take the techniques illustrated here and apply them to other things to be watched, such as database servers or mailers.

I hope this has been useful for you. I’ll see you next month.




Listing One: Randal’s Web Server Watcher

1 #!/usr/bin/perl -w
2 use strict;
3 $|++;
4

5 use LWP::Simple;
6 use Curses;
7

8 my $STATUS=”http://localhost/server-status“;
9 my $SLEEPTIME = 10;
10

11 my @FIELDS = qw(
12 pid comm state ppid pgrp session tty tpgid
flags
13 minflt cminflt majflt cmajflt utime stime
cutime cstime
14 counter priority timeout itrealvalue
starttime vsize rss rlim
15 startcode endcode startstack kstkesp kstkeip
16 signal blocked sigignore sigcatch
wchan nswap cnswap
17 );
18

19 my $SHOWLABELF = “%11s %5s %12s %1s %6s
%6s %6s %6s %6s %6s”;
20 my $SHOWLABEL = sprintf $SHOWLABELF,
21 qw(START_TIME PID iSTATUS S MINFLT MAJFLT
CPU PCPU RSS NSWAP);
22 my @SHOWFIELDS =
23 qw(starttime pid STATUS state minflt
majflt CPU PCPU rss nswap);
24

25 my $BOOTTIME = do {
26 local *FOO;
27 open FOO, “/proc/stat” or die “/
proc/stat: $!”;
28 local $/;
29 (<FOO> =~ /btime (\d+)/)[0];
30 };
31

32 my %cpu_history;
33

34 initscr;
35 {
36 $_ = get “$STATUS?notable” or die “no
status!”;
37

38 s/[\d\D]+Server Details.*\n//;
39 s/^<hr>.*//osmosis; #:-)
40

41 my %info;
42 my %cpu;
43

44 $cpu{TIME} = time;
45 my $seconds = exists $cpu_history {TIME}
? $cpu{TIME} – $cpu_history {TIME} : 0;
46

47 my %http_status = /Server \d+-.*?\
((\d+)\).*\ [(?:<.*?>)?(.*?)
(?:<.*?>)?\]/g;
48

49 for my $file (map “/proc/$_/stat”, keys
%http_status) {
50 local *FILE;
51 open FILE, $file or next;
52 $_ = <FILE>;
53 close FILE;
54

55 my %fields;
56 (@fields{@FIELDS}, my @rest) = split
57 my $pid = $fields{pid};
58 $info{$pid} = \%fields;
59 $info{$pid}{STATUS} = $http_status
{$pid};

60 my $cpu = $fields{utime} +
$fields{stime}
61 + $fields{cutime} +
$fields{cstime};
62 $cpu{$pid} = $info{$pid}{CPU} = $cpu;
63 if ($seconds and exists $cpu_history
{$pid}) {
64 ## delta jiffies over seconds
is already percentage!
65 $info{$pid}{PCPU} = sprintf
“%5.1f%%”,($cpu-$cpu_
history{$pid}) / $seconds
66 } else {
67 $info{$pid}{PCPU} = “??????”;
68 }
69 }
70 %cpu_history = %cpu;
71

72 erase;
73 addstr(0,0,$SHOWLABEL);
74 my $row = 1;
75 for my $pid (sort {
76 $info{$a}->{starttime} <=>
$info{$b}->{starttime}
77 or $info{$a}->{pid} <=>
$info{$b}->{pid}
78 } keys %info) {
79 addstr($row,0,
80 sprintf($SHOWLABELF,
81 time_convert($info
{$pid}{starttime}),
82 @{$info{$pid}}{@SHOWFIELDS
[1..$#SHOWFIELDS]}));
83 $row++;
84 }
85 move(0,0);
86 refresh;
87 sleep $SLEEPTIME;
88 redo;
89 }
90 endwin;
91

92 sub time_convert {
93 my $jiffies = shift;
94 my $when = $BOOTTIME + $jiffies/100;
95 my $string = localtime $when;
96 if ($when < time – 12*60*60) {
97 substr($string, 4, 7) .
substr($string, -4, 4);
98 } else {
99 substr($string, 11, 8);
100 }
101 }





Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and co-author of Learning Perl and Programming Perl. He can be reached at merlyn@stonehenge.com.

Comments are closed.