dcsimg

Are Your Pipes Clogged?

Recently, I attended a presentation at the Portland Linux Unix Group (http://www.pdxlinux.org/) by Michael Rasmussen. At one point in his talk, Michael mentioned that he needed to analyze the traffic on his company's Web server and was surprised that many of the commercial and freely available Web traffic tools do not provide satisfactory reports on the amount of traffic, either bytes per second or hits per second, during "bursts" or "spikes" in the load. But, being reasonably fluent in Perl, Michael wrote a quick script to crawl through the text Web log, and got the data he needed.

Recently, I attended a presentation at the Portland Linux Unix Group (http://www.pdxlinux.org/) by Michael Rasmussen. At one point in his talk, Michael mentioned that he needed to analyze the traffic on his company’s Web server and was surprised that many of the commercial and freely available Web traffic tools do not provide satisfactory reports on the amount of traffic, either bytes per second or hits per second, during “bursts” or “spikes” in the load. But, being reasonably fluent in Perl, Michael wrote a quick script to crawl through the text Web log, and got the data he needed.

That got me thinking about how much (or little) traffic I was shooting out of the Web server for www.stonehenge.com. Since I’m already storing the data into a MySQL database (via DBI and mod_perl), I thought it shouldn’t be tough to analyze not only the bytes and hits per second (average and burst) as he had done, but also the CPU usage per second, which I’m also recording for each hit in my log.

To make the result more realistic, I decided to spread out the total bytes and CPU usage over the duration of the hit, which I’m also recording as wall-clock seconds. For example, if 100K has been downloaded over 10 seconds, each second in the span should get 10K of that allocation on average. And because this is a quick-and-dirty program to get some typical stats, I decided to handle this by simply having an array with a separate slot per second, limiting the reasonable size of analysis to just a few days. But that was enough to cover a useful time period, and actually gave me some information about usage that I didn’t have from my other reports.

So, let’s look at how I did all that in Listing One.




Listing One


1 #!/usr/bin/perl -w
2 use strict;
3 $|++;
4
5 use DBI ();
6 use Statistics::Descriptive;
7
8 my $DSN = ‘dbi:mysql:stonehenge_httpd’;
9 my $DB_AUTH = ‘do_not:try_this_at_home’;
10 my $DAYS = 2;
11
12 my $dbh = DBI->connect($DSN, (split ‘:’, $DB_AUTH),
{ RaiseError => 1 });
13 $dbh->do(“SET OPTION SQL_BIG_TABLES = 1″);
14 ## $dbh->trace(2);
15
16 my $sth = $dbh->prepare(qq{
17 SELECT unix_timestamp(when), wall, bytes, cpuuser
+ cpusys + cpucuser + cpucsys
18 FROM requests
19 WHERE when >= DATE_SUB(NOW(), INTERVAL $DAYS DAY)
20 ORDER BY when
21 });
22
23 my $offset = time – 86400 * $DAYS;
24
25 $sth->execute();
26 $sth->bind_columns(\my($when, $wall, $bytes, $cpu));
27
28 my @hits, @bytes, @cpu;
29
30 while ($sth->fetch) {
31 $when -= $offset;
32 my $hits = 1 / ($wall + 1);
33 $bytes *= $hits;
34 $cpu *= $hits;
35 for ($when..($when + $wall)) {
36 $hits[$_] += $hits;
37 $bytes[$_] += $bytes;
38 $cpu[$_] += $cpu;
39 }
40 }
41
42 $dbh->disconnect;
43
44 defined $_ or $_ = 0 for @bytes;
45 defined $_ or $_ = 0 for @hits;
46 defined $_ or $_ = 0 for @cpu;
47
48 if (0) { # dump raw data
49 for (0..$#bytes) {
50 printf “%30s %10.1f %4.1f %4.1f\n”,
51 scalar(localtime($offset+$_)), $bytes[$_],
$hits[$_], $cpu[$_];
52 }
53 }
54
55 for my $pair ([bytes => \@bytes], [hits => \@hits],
[cpu => \@cpu]) {
56 my $stat = Statistics::Descriptive::Full->new;
57 $stat->add_data(@{$pair->[1]});
58 for my $thing (qw(mean max)) {
59 printf “%4s %5s/sec %10.3f\n”, $thing, $pair->[0],
$stat->$thing();
60 }
61}

Lines 1 through 3 start nearly every program I write, enabling warnings and compiler restrictions, and disabling the normal buffering of standard output.

Line 5 brings in the standard DBI module. For this program, I also had to have the DBD::mysql module installed to access the database. Line 6 pulls in the Statistics:: Descriptive module, so I don’t have to remember how to do the mean and max values, and because I was also considering the standard deviation (but ended up not using it). All these modules are found in the CPAN.

Lines 8 through 10 provide the configuration constants. In particular, the $DSN value defines the DBI connection information, while $DB_AUTH gives the username and password for the particular database. Obviously, these values have been slightly smudged for security reasons. And $DAYS gives the number of days (an integer) over which this report will process. We’re making and processing three arrays of this value times 86400 (seconds in a day), so don’t get too eager to crank it up to a large value.

Lines 12 and 13 set up the database connection, including enabling automatic “die on error” handling, so I don’t have to check the valid return status and every database call. I routinely enable MySQL’s big table mode, because I’ve been burned when I didn’t do this in the past.

Line 14 is for debugging. When I’m not sure what I’m passing to DBI, or what values are getting returned, I can uncomment this line to get a tracing of operations.

Lines 16 to 21 prepare the SQL query to fetch the data of interest. The return value is a “statement handle,” that allows me to carry out the calculation and obtain the results. The parameter is a double-quoted string, using the qq{…} notation so I can easily balance the brackets nicely with my text editor. The one value that is interpolated is the number of days to fetch backwards in time. MySQL’s date functions are a bit odd to wrangle, and took me a few tries to get right.

My Web server log is stored in the table results. In that table, the when column is a date/time stamp of the transaction, wall is the wall-clock seconds (as an integer) from start to end, bytes is the number of bytes returned by the server, and the four cpu values represent the user, system, child user, and child system CPU seconds (as floating point).

The query returns the Unix timestamp value (seconds since the beginning of 1970 in London), the wall clock seconds, the number of bytes, and the sum of the CPU usage in seconds, for all entries that have been posted in the past $DAYS number of days. The result is ordered for no apparent reason, except that during testing I was adding things like LIMIT 10 and wanted a consistently understandable output.

Line 23 computes the Unix timestamp corresponding to the beginning of the period in question. Note that if the time clocks of the DBI server and the server this process runs on are out of sync, this is probably not going to work. Luckily, I was actually doing this on the very same machine, so there’s no chance that the values were out of sync.

Line 25 starts the database query humming, and may even fetch all the results into a temporary area.

Line 26 declares the result to have four output values for the columns, which will be stored directly into the four named scalars as I fetch each row of the result. This keeps the interface simple, as well as very fast.

Line 28 declares the three accumulating arrays, for number of hits, total bytes transferred, and total CPU seconds used. These arrays will grow to be 86400 times $DAYS elements each. Since @bytes and @hits are always integers, I considered playing with a vec()-based data structure, but decided to optimize along those lines only if the straight solution didn’t fit in memory. Never optimize prematurely, especially when you’re facing deadlines.

Lines 30 to 40 process the database response data. As each row is returned (indicated by a true value back from fetch), the values are placed directly into the four scalar variables named in line 26. The next step is to apportion the hits, bytes, and CPU seconds across the wall-clock interval. First, line 31 computes the initial array index. Line 32 scales down the one hit we’re handling by 1 more than the wall count. Since we’re counting integer seconds, we’ve actually got a rounding error here, but a kludge of simply flattening out the request over a slightly larger rounded-up time period was satisfactory to me. And the +1 prevented a potential divide-by-zero error nicely.

Lines 33 and 34 adjust the bytes and CPU seconds to their per-second values, making it easier for lines 35 to 38 to accumulate that value across the spread of the range of seconds for this hit. I actually don’t like the repetition of the code: I’m saying the same thing with different variables far too often here, and could probably have factored that out with a bit more effort.

Once the database scan is complete, we disconnect in line 42 to let the server know we’re done there.

Lines 44 through 46 fix the undefs that appear in the arrays of data for the timespans when there’s no activity on my Web server. Leaving these undef values in the array will trigger warnings later, even though they are treated as zero. Right result, but noisy, so a quick search-and-destroy takes care of it.

Lines 48 to 53 dump out the raw data in a pretty format. Note that the code is inside an if (0) control, so there’s no execution of this code in the final program; while I was testing, this code was handy, and I tend to leave my paint scaffolding erected in case there’s a chance I’ll be doing more painting later. Note the use of the scalar value of localtime() to quickly turn a Unix timestamp into a human-readable string.

The real work happens in the final loop, in lines 55 to 61. This loop is executed three times (and yes, it was originally written as a cut-n-paste nightmare where I had the same thing three times with three different variables). Each time through the loop, $pair is a reference to a two-element array consisting of a label and a reference to the array that holds the data corresponding to the label. Inside the loop, we create a new statistics object (line 56), push the data into the object (line 57), and then compute the mean and the maximum by calling two different methods against the statistics object. The mean() method gets the mean label, and the label of the item being processed from the outer loop. The use of printf() here formats the output nicely.

And there you have it. The output for my Web server as I just ran this looks like this:









meanbytes/sec2014.457
maxbytes/sec207162.000
mean hits/sec0.198
maxhits/sec25.524
meancpu/sec0.008
maxcpu/sec1.212

At peak, I’m nearly filling a T-1 with the amount of data I’m squirting out, and yet on average, I’m using about 1% of the CPU for Web service. That last value shouldn’t be greater than one: there is obviously some discarded data about the specific allocation of CPU to particular seconds during the day, yielding a slightly bogus max CPU seconds per second.

I hope you found it useful to see how to make sure you’re filling your pipes with your Web server. Until next time, enjoy!



Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and can be reached at merlyn@stonehenge.com. Code listings for this column can be found at http://www.stonehenge.com/merlyn/LinuxMag/.

Comments are closed.