Introduction to mod_perl, Part Three

In the final of a three-part series, see how to use the mod_perl API and affect the processing of Apache's many phases.

Introduction to mod_perl (part3)

Last month, I took a well-needed break, giving my space up formatters related to Perl6. Thank you, guest writers and thank youeditors!

In the previous two columns, I introduced mod_perl,including fundamental concepts, basic configuration directives, andstarted describing the callback API objects.

In this month, I’ll finish my introduction tomod_perl by talking about the rest of the API, showingsome sample code to use in the content and other phases, and thenconclude with some pointers to further information.

The Apache request object, often referred to as $r,can be obtained as the first parameter passed in to a handlersubroutine, or via the request method of theApache class. Calling methods against this objectgenerally triggers callbacks to the published Apache API. Forexample, document_root can get (or even set) the“document root” (the directory that the root URL initially isdirected). Although you might be tempted to compute a URL relativeto this directory, be aware that some directives such asAlias will further modify this mapping.

Another interesting request method is dir_config,which provides access to the PerlSetVar andPerlAddVar values. For example,

my @items =$r->dir_config->get(‘SomeKey’);

sets @items to the zero or moreSomeKey (case insensitive) values seen in the variousconfiguration files appropriate to the current request. Thismechanism permits a simple configuration change to modify thebehavior of code in a syntax that is config-file compatible.

Handlers can pass notes to each other during theexecution of a given request. For example, an error handler can seta note about the reason for a failure, then perform an internalredirect to a different request URL to display the response. Thenew URL handler can include code to fetch this note as part of themessage. Or, a handler can stuff notes that will later be picked upby a custom log description. Perl can get and set these values withthe notes method.

Although standard notes are limited to simple text strings,arbitrary Perl references and data structures can be set andfetched with pnotes. An advantage of pnotes overstandard global variables is that these notes are guaranteed to bedestroyed at the end of a request cycle, making memory managementsimpler.

Some of the request objects manage the handlers related to arequest. Because the list can be queried and set dynamically, arequest could have a modified execution path. Thehandler method sets the handler for the content phase,while set_handlers, push_handlers andget_handlers manage the remaining phases. One commontechnique is to define a Perl trans handler that notices that arequest must have Perl treatment in one or more of the remainingphases, and then calls set_handlers orpush_handlers to affect those phases.

The log_error method sends a message to the servererror log. The warn method is similar, sending amessage only if the Apache log level is warn orhigher. Writing to STDERR triggers the equivalent oflog_error method calls, thanks to a tiedfilehandle.

Some of the methods apply to the connection object,which maps into those parts of Apache API that relate to thecurrent HTTP connection. You can call the connectionmethod against the request object to get the connection object,often referenced in the documentation as $c. Inparticular, calling remote_ip against $cgives the current remote IP address (a quick operation). Callingremote_host causes Apache to look up the hostname forthe given IP address, using reverse-to-forward validation forsecurity purposes, and is thus rather expensive and should beavoided if unnecessary.

One cool connection object method is aborted, whichreturns true if the HTTP connection has been disconnected, asreported by the operating system. For example, a CPU-intensecalculation of longer than a dozen seconds or so can check thisvalue periodically to see if there’ll be nobody home to hear theresult.

Another part of the Apache API relates to the server itself,referenced through the server object, created by callingserver against the request object. Through serverobject methods, we can discover the server_admin(email address), the server_hostname (the hostname towhich we are bound if multihosted), the port number onthat host, and whether or not we are a virtual server(is_virtual). Additionally, a space-separated list ofcanonical names and aliases is returned fromnames.

The Apache API also provides a set of library functions tohandlers for common operations. These functions can be imported asif they were defined in the Apache::Util package (notan object class).

For example, escape_html replaces theHTML::Entities::encode routine, operating much faster(perhaps by a factor of 100 to 1). Similarly,escape_uri replacesURI::Escape::uri_escape, although the savings aren’tas significant. The inverses of unescape_html andunescape_uri are also provided.

Other utility routines include parsedate (parsecommon forms of the dates used in HTTP headers and logfiles,ht_time (format a time string similar to usingstrftime), size_string (converts a sizeinto a friendly value of bytes or K or M, and so on), andvalidate_password (compares an entered passwordagainst a one-way hash such as crypt, MD5, or SHA1).

The mod_perl API makes getting to the arguments ofa request relatively painless. To get the parameters of aGET or HEAD request, we look at theargs method of the request object, which returns theparameters in key/value pairs as a list. Similarly, thePOST parameters can be fetched with thecontent method in a list context. For example,

my %args = $r->args; my $name = $args{name}; #get param “name”

Beware, however, that “select multiple” parameters (which mayhave more than one value for a given key) are destroyed this way.To fix that, we have to work a bit harder:

my @args = ($r->args, $r->content); my%params; while (my($k, $v) = splice @args, 0, 2)) { push@{$params{$k}}, $v; }

Now @{$params{name}} is all of thename params.

In a scalar context, these methods return the original data (thequery string or the raw content), permitting easy regeneration ofthe original request. Note that content must be calledonly once per request, because after that, the contents are gone.One quick use for this is to simplify the code above, if you don’tmind using a CGI.pm object:

use CGI; my $q =CGI->new($r->args . $r->content); my @name =$q->param(“name”);

Other handler phases include the TransHandler(mapping URLs to filesystem paths), AccessHandler(validating the client host), AuthHandler (validatinguser credentials), AuthzHandler (verifying that thevalidated user can perform the requested operation),TypeHandler (determining the MIME type for a givenURL), and LogHandler (creating log entries).

For example, a sample TransHandler that“redirects” the request based on the day of week might looklike:

package My::TimeAdjust; use Apache::Constantsqw(DECLINED); sub handler { my $r = shift; my $uri = $r->uri; my$dow = (localtime)[6]; # 0 = sunday, 6 = saturday $uri =~s{/DOW/}{/$dow/}; $r->uri($uri); return DECLINED; }

We place this code somewhere within the @INC pathunder My/TimeAdjust.pm, and add the handler to theconfiguration:

PerlTransHandler My::TimeAdjust

This single configuration directive causes all matching URLs totrigger our TransHandler, replacing any appearance of/DOW/ in the URL with /3/ on Wednesday,so a fetch to /foo/bar/DOW/bletch becomes/foo/bar/3/bletch. Imagine trying to do that withmod_rewrite!

For an example AccessHandler, let’s deny accessfrom all odd-numbered hosts:

package My::Hosts; use Apache::Constantsqw(DECLINED FORBIDDEN OK); sub handler { my $r = shift; my $host =$r->connection->remote_ip; return FORBIDDEN if $host !~/[13579]$/; return DECLINED; # other auth may deny }

Once again, we add this to the configuration with simply:

PerlAccessHandler My::Hosts

And now our odd hosts are no longer permitted. Note that accesschecks are processed in order, so we have DECLINED ifwe have no opinion, permitting other more traditional checks toapply.

As a custom LogHandler, we might consider recordingthe wall clock and CPU time used by our request. For this, we needto start and stop a stopwatch. We’ll start it with:

sub TimeIt::Init::handler {shift->pnotes(‘times’, [time, times]); return DECLINED; }

This init handler (as early as possible in the cycle) saves thewall-clock time and the four values from times (userand system CPU time, child user and child system CPU time) into apnote for later retrieval. At the end of the request, we get thevalues, compute the delta, and log the difference:

sub TimeIt::Log::handler { my $r = shift; my@times = @{$r->pnotes(‘times’)}; @times = map { $_ – shift@times } time, times; $r->warn($r->uri, “: real/cpu times:@times”); return DECLINED; }

We’d enable this with:

PerlInitHandler TimeIt::Init PerlLogHandlerTimeIt::Log

(See also Apache::TimeIt in the CPAN, which doesthis in a more manageable fashion.)

If you’re writing Apache handlers such as the ones above, youcan test them with Apache::FakeRequest, which can fakeup the various fields of the request object:

use Apache::FakeRequest; use My::Module; my$request = Apache::FakeRequest->new(‘get_remote_host’=>’foobar.com’); My::Module::handler($request);

Finally, if Apache and mod_perl is compiledcorrectly, you can even use Perl code directly in your web pagesvia Server-Side Includes:

<!–#include perl=”perl code here”–>

I’ve never done this in a real website, but you might find thatthis helps you in a pinch.

For further information about mod_perl, check outthe installed manpages, such as Apache,cgi_to_mod_perl, mod_perl,mod_perl_traps and mod_perl_tuning.

For dead-tree supplements, check out the excellent WritingApache Modules with Perl and C, by Doug MacEachern and LincolnStein, which is the definitive guide to mod_perl bythe guys who have been around since the beginning.

Also of noteworthy interest is: The mod_perl Developer’sCookbook, by Geoffery Young, et al; Apache: The DefinitiveGuide, by Ben Laurie and Peter Laurie; and HTTP: TheDefinitive Guide, by David Gourly, et al.

On the web, perl.apache.org serves as the primaryinformation on mod_perl, including advocacy, FAQs and other links.The www.modperlbook.com site provides supplementalinformation from Doug and Lincoln’s book, andwww.modperlcookbook.com gives listings and otherinformation from Geoffrey’s book.

Even in three columns, I’ve barely scratched the surface of howuseful and cool mod_perl works for everyone. With bigsites like amazon.com andticketmaster.com using the technology continually,mod_perl will be around for a long time to come. Untilnext time, have fun, and enjoy!

Comments are closed.