mod_perl, Part Two

As I mentioned last month, having persistent Perl code means that some steps of your application can be reused rather than repeated. One very easy optimization is keeping your database handles open between web hits, rather than reopening them on each new hit. The Apache::DBI module (found in the CPAN) does the work for you by altering the way normal DBI connections are processed.

As I mentioned last month, having persistent Perl code means that some steps of your application can be reused rather than repeated. One very easy optimization is keeping your database handles open between web hits, rather than reopening them on each new hit. The Apache::DBI module (found in the CPAN) does the work for you by altering the way normal DBI connections are processed.

If your application is like most, you simply add PerlModule Apache::DBI to the configuration file, and it just magically works. The disconnect() method of DBI is altered so that it doesn’t really disconnect, and the connect() method attempts to reuse an already existing handle opened with the same database parameters (including user and password).

The upside is efficiency. The downside is that every mod_ perl Apache process eventually gets one or more persistent connections to the database server, which may affect a license count or a process limit.

Another nice module is Apache::Template (also in the CPAN). This module turns your page deliveries into a full, embedded templating system using Template Toolkit. The configuration is similar to Apache::Registry (shown last month):

PerlModule Apache::Template …
<Location /tt2>
SetHandler perl-script
PerlHandler Apache::Template

Now each file located within the directory mapped from the /tt2 URL is treated as a template in the Template Toolkit language. The template is compiled into Perl code and executed, and the result is delivered to the web client.

With additional directives, the results of those steps can be cached, creating a very powerful web site with minimal overhead and maximum flexibility, comparable to a PHP- or ColdFusion-based web configuration, but with a lot more powerful features. Templates can also respond to CGI-style form parameters, making any page an interactive page!

There are many other mod_perl modules available as well. See the current list for yourself by entering apache:: at the search box of http://search.cpan.org/.

It’s very simple to write your own handler as well, if you can’t find something off the shelf to do what you want. Simply create a module, like My::Module . The module should have a routine named handler(), which is called from the embedded Perl interpreter at the appropriate phase. Then, install your module into mod_perl‘s @INC path.

To trigger your module, add the appropriately scoped Perl handler directive to a configuration file. For example, to make your handler the content handler for all URLs beginning with /fred, simply add:

<Location /fred>
SetHandler perl-script
PerlHandler My::Module

Content handlers return content back to the web browser and get invoked after nearly all the other phases are complete. This step is normally where things like file content delivery or CGI scripts are triggered. However, you can do anything you want.

For example, this simple handler…

package My::Content;
use Apache::Constants qw(OK);
sub handler {
my $r = shift;
$r->print(“Hello, world!\n”);
return OK;

… delivers the text/plain-tagged content of “Hello, world!\n” for any URL mapped to this handler. Of course, since we’re running Perl code here, we could do almost anything, including the generation of dynamic content.

Once loaded, a handler remains in memory, speeding up the process significantly. You can hold database connections (see Apache::DBI earlier), access Apache API callbacks (discussed shortly), and get to C code for additional libraries and optimizations.

The return value is important. An OK value indicates that the handler ran successfully, and in a content handler, would also cause an “OK” status to be delivered as part of the HTTP transaction. However, you can return other values, such as NOT_ FOUND, which triggers the traditional “404″ processing of the web server, including rolling over to an ErrorDocument if needed.

In all phases, you can also return DECLINED, which notifies Apache that this particular handler is not the proper handler at this phase and that Apache should use the next handler as if this handler weren’t here. I use DECLINED in the content phase, for example, if my content handler has been directed to deliver a directory instead of a file. By declining, Apache rolls over to the next handler, which can display a directory index if enabled.

Declining is also useful in “trans handlers,” which are in the early phase where Apache decides how a URL maps to a filepath. I can install a Perl trans handler that alters the URL or the filepath based on various parameters and then returns DECLINED, allowing the rest of the normal translation to take place with my altered values.

From a handler, you can access various callbacks into the Apache API, including nearly any relevant thing that can be called from any handler written in C. These APIs are presented as objects of various types, known as the request object, server object, connection object and various utility classes.

The request object is used in nearly every handler, representing everything you need to know about what request came in, and giving you a place to provide (or alter) the response. A handler normally gets the request object as the first parameter, and traditionally, this value is placed in a variable called $r:

sub handler { my $r = shift; … }

If you didn’t save the request object, you can still get at the same API by recreating the same object, using a call to a class method:

sub handler {… my $r = Apache->request;}

This is sometimes handy if you’re in a subroutine being called from another part of a handler.

The request object has many methods that be called on it. For example, as_string() creates a human-readable dump of all the headers and content of the request. I use this occasionally to see if I’m getting the bits that I expect.

Some of the other methods on the request object include: method() (to tell if a request was a GET or POST), header_only() (was this a HEAD request?), uri() (to get or set the requested URI), and filename() (to get or set the corresponding filepath). Those last two methods are very useful in trans handlers, allowing me to write mod_rewrite-like redirections in easy-to-use Perl, rather than obtuse and limited mod_rewrite syntax.

Other request methods less frequently used include: path_ info() (to get the part of the URL extended beyond the resource being accessed), args() (to get at the query-string arguments), and content() (to get at the POST content).

The request object also provides the response, generally in the content phase. Calling send_http_header() triggers the beginning of the response, including the appropriate HTTP status code and MIME type. The MIME type and status can be queried and set with the content_type() and status() methods, respectively. Arbitrary headers can be added with header_out(). Cache-control can be altered with the no_cache() method.

Of course, the point of the content handler is delivering content. Content is commonly sent in one of three ways. Arbitrary text can be sent with the print() method, although the content is more commonly delivered by printing to the STDOUT filehandle, which is conveniently tied to do the same thing. (STDERR is opened on the error log. It’s convenient for warning messages and other information.)

For speed and convenience, an open filehandle can also be used to deliver content, using send_fd(). Using this method invokes Apache’s internal delivery mechanism, delivering files as fast as if mod_perl weren’t involved.

The request object also has methods related to authentication, including getting at the “Basic Auth” username (if any), password, and triggering a “Basic Auth” failure if the authentication is not provided.

A handler can also request an external or internal redirect. For example, a trans handler or a content handler can determine that the browser should be sent to a new location (external redirect), with $r->header_out(“Location”, “http: //new.place”); return REDIRECT;

Or, a handler can restart processing with a new URL (internal redirect): $r->internal_redirect(“/new/ url”); return OK;

When an internal redirect occurs (such as for ErrorDocument processing), the previous request object is accessible as well. The main(), prev(), next(), and last() methods on the request object allows a handler to walk up and down this chain of internal redirects.

One powerful feature of the request API is that you can ask Apache to act “as if” a particular URI (lookup_uri()) or filename (lookup_file()) was provided, then look at the status of the subrequest (like 404 or 200) to see if that subrequest would have been successful. The subrequest is processed up to (but not including) the content phase, so aliases, redirects, and authorization phases are all processed.

Taking it a step further, you can also run() the subrequest, delivering the content as part of your request. This is similar to a “server-side include,” but from within Perl. (This is actually the API that the SSI mechanism uses.) Unfortunately, there’s no way to capture the output: it’s being sent down the pipe to the web client.

The request object also contains methods related to the web client. You can ask for bytes_sent() to find out the size of the current response (typically used in a log handler), and determine with proxyreq() if this was a proxy request. You can also get hostname(), and get_remote_host() (the IP address of the web client).

Again, I’ve run out of space, so in next month’s column, I’ll continue describing the request object API and some of the other objects as well. I’ll also show more sample code. Until next time, enjoy!

Randal Schwartz is the chief Perl guru at Stonehenge Consulting. He can be reached at merlyn@stonehenge.com.

Comments are closed.