More and more Web-hosting services and ISPs are providing CGI space in addition to customer Web pages, either as a free add-on, or an extra-cost service. And there are even a few free CGI servers out there on the Net. The problem with these services is that the (shared) Web error log is often inaccessible, or at an unknown location. That's fine if your CGI program never commits an error, or if you are using the PSI::ESP module to determine the error text. But most of us will write "blah blah or die blah" in our CGI scripts, expecting to somehow be told what's wrong when it goes wrong.
More and more Web-hosting services and ISPs are providing CGI space in addition to customer Web pages, either as a free add-on, or an extra-cost service. And there are even a few free CGI servers out there on the Net. The problem with these services is that the (shared) Web error log is often inaccessible, or at an unknown location. That’s fine if your CGI program never commits an error, or if you are using the PSI::ESP module to determine the error text. But most of us will write “blah blah or die blah” in our CGI scripts, expecting to somehow be told what’s wrong when it goes wrong.
Some have resorted to dumping the error message to the browser. In fact, during development, there’s nothing wrong with adding:
use CGI::Carp qw(fatalsToBrowser)
to your program, and doing the debugging right in your browser window. (See the CGI::Carp documentation for details.)
But this is a huge security hole if left in production code. While surfing the World Wild Web, I often see error messages that reveal far too much information. I’ve seen program names, user IDs, languages used, pathnames to key files, and even the exact SQL query attempted dumped out in these errors. I have no right (or need) to know that, and a bad guy can use such valuable information to assist him in breaking into the system.
So, if we can’t put it into the browser, and we can’t get to the error log, where else can we put the errors? Why, in e-mail of course!
All we need to do is write a module (let’s call it FatalsToEmail.pm) that we’ll then stick somewhere on the system (like /home/merlyn/lib), and pull in at the top of our CGI script. This can be seen in the following example:
And then when the CGI script dies, the text of the error message gets sent to me, while the user is told that “something went wrong.” Too cool? Yup, so read on.
The module source is in [see Listing One on pp. 90-92]. Line 1 sets up the package, important because we don’t want any symbols to collide with the user of this module. Line 2 enables my favorite compiler restrictions, including requiring me to declare my variables, discouraging the use of symbolic references, and preventing barewords from being treated as quoted strings.
Lines 4 through 10 set up the four configuration variables for the module. The Address provides the e-mail address to which the messages should go, here defaulting to webmaster at the mail host. Speaking of which, Mailhost sets up the mail-delivery host. This doesn’t have to be the final machine on which the mail ends up, but we’ll need a friendly SMTP server somewhere that can handle mail from the script. The localhost default should be fine for most machines, except those that don’t run a mailserver on the Web server.
The Cache and Seconds parameters interact to limit the amount of mail delivered. The default Cache value of undef gives the script the right to deliver a single separate piece of e-mail for each fatal error. This is great for testing or for low-volume sites. But it’d be a potential “denial of service” attack for high-volume sites or malicious users.
So instead, we bunch up the rapidly appearing messages into a cache, guaranteed to be sent no more often than the indicated number of seconds. To get the bunching up, Cache must be set to a filename path that is writable by the user ID executing the CGI program. A typical value might be something like /tmp/merlyn.weberrors.cache. (The actual caching strategy is defined below.) Several programs can share the same cache: the error messages within the cache are prefixed by the filename and line number from which they sprouted.
Lines 12 though 20 handle the configuration of the module. If the use line appears like:
then we save Address and Cache to override the default values. The logic in line 15 ensures that someone can use cache or CACHE or even cAcHe for the identifier tag, and we’ll still store it into the right hash slot.
Line 22 establishes the subroutine in this module as the die handler. From here on out, we’re the ones that will get called on a fatal error.
Lines 24 through 44 define this handler. The text message for the fatal error shows up in $message in line 25. Line 26 gets the current local time of day to label the message consistently. Line 27 extracts information about the filename and line number from which the error message was triggered.
Lines 29 through 31 prefix each line in the message with a unique identifier, consisting of the filename, line number, time of day, and process ID number. This identifier is helpful to group error messages in cache-dumping e-mail, and will also provide the necessary locators to let you fix the problem.
Lines 33 to 39 dump a CGI response. Note that minimal information is provided.
If the CGI program has already sent an HTTP header, the header we print in line 34 will show up as content. There’s nothing much I can do about that from this module, at least not in a CGI environment.
Then there’s the cleanup. Line 41 triggers the e-mail (or caching, if needed). And line 43 executes a die within the die handler. This step is needed so that Perl knows to finish aborting the program. The message shows up on STDERR, which will typically be the real Web error log.
Lines 46 to 89 attempt to phone home with the error message. (Perhaps I should have called this subroutine “e_t”?). Nearly everything is inside a large eval block so that any mistakes will still set up a graceful exit from this program. The message is captured and delivered in lines 86 to 88, including both the original message that wanted to be mailed, as well as the error that kept it from being mailed.
Lines 50 to 75 handle the cache, if needed, as determined by line 51. If we’re caching, then line 52 attempts to open the cache file for both reading and writing. If that succeeds, it’s time to operate.
Line 53 blocks the process until we’re the only one using the file in this manner. We’ll want to keep the time to a bare minimum from here until we close the filehandle, because we’ve just entered a zone that only one process at a time can be within.
Lines 55 to 62 handle the case where the cache is old enough for us to send. If the file modification time (“mtime”) is more than some number of seconds ago (determined by the Seconds configuration variable), then it’s been a while since we wanted to send some e-mail, and there may in fact be previous contents that we deferred. So lines 57 and 58 grab that.
If there’s more than 8 Kb of information in the cache, we send only the first 8 Kb with a warning. This keeps the mail message from becoming yet another denial of service attack: filling up our mail spool. This way, we will get at most roughly 8 Kb of information every 60 seconds (or whatever Seconds is configured as). The old cached messages are prepended in front of the current message in line 59.
Lines 60 and 61 remove the cached material, so that we have an empty file that has been modified just now, regardless of whether any prior content was in the file. This is important, because we want the next hit within the cache window to be deferred, repeatedly, until we have another idle period. And finally, line 62 closes the cache, also releasing the lock, since we now have the information we need.
Lines 64 to 67 handle the hits within the cache window, no more than Seconds number of seconds since the previous hit. In this case, we just seek to the end of the file, and dump the message there. We are guaranteed that the message will end in newline, but if I wasn’t sure, I’d add a \n here somewhere, to ensure that each error message has lines that start with the prefix identifier computed earlier. The return in line 68 skips over all the e-mail handling code below, since we won’t be sending any mail on this trip.
Lines 71 to 73 create an initial empty cache file if it does
not exist. We’ll treat a nonexisting file as if it was an empty old file, which means it still needs to have a timestamp updated to “right now.” Note the use of the “append” open operation: Since we don’t have a lock, we may be trying to create a file while there’s already someone else who has created the file, gotten a lock, and started writing in the file (it could happen!). Therefore, the best we can do is ask the kernel to “make the file if it doesn’t exist, or be ready to append to it if it does.” Which works here just as we needed it.
Now for the fun part. Lines 77 to 84 send the e-mail message. First, we try to suck in the Net::SMTP module in line 77. This may not be possible, because the CPAN module may not be installed (it’s part of the libnet bundle from Graham Barr, not part of the core installation).
However, the require directive might fail, so it’s inside an eval. If the require succeeds, the value of 1 is returned, stopping our inner die operation, otherwise we’ll abort. The die here is being caught by the outer eval. Wheeee.
Lines 78 and 79 set up a Net::SMTP connection object to the requested Mailhost. If there are any errors, I’m told the error will be in $@, not $!, so I include that here in the die message. Again, this die will be caught by the outer eval block.
Lines 80 and 81 tell the SMTP server what our sender name is, and what the recipient name is. The sender name will be used for error messages from the various mailers along the way, and the recipient name is the ultimate destination.
Here, we’re using the configured e-mail address for both. This could get weird if the address is undeliverable: the final mail host will attempt to bounce the message back to the same address to which it is attempting delivery. Hmm. Not a good idea. But it’s better than any alternative I could think of today.
Lines 82 and 83 provide a subject line and a body for the message. The subject line has nice bright shiny capital letters in it, including the name of the program that triggered the error for easy mail filtering by smart e-mail readers.
Note that most mail servers will also construct a Date and From and To header for us automatically, so I can lean on it to do the job. And finally, line 84 tells the mail server we’re done for this round, and shuts down the connection.
That’s it. Put FatalsToEmail.pm some place accessible to your CGI script, add the appropriate use lib line to point at the directory, and you can start getting your errors via a timely e-mail message instead of having to scruff through the old shared Web error log. Until next time, enjoy!
Linux Magazine /
July 2000 / PERL OF WISDOM
Capturing Those CGI Errors as E-mail