Scripting Your Apache Server with Mod_Perl

According to the folks who survey such things, the Open Source Apache server is the most popular Web server on the Internet. And Perl is the language of choice for many scripts running on all those Apache servers. But if you really want to get the most out of Perl and Apache, you need to embed Perl directly into your server using Apache's mod_perl extension.

According to the folks who
survey such things, the Open Source Apache server is the most popular
Web server on the Internet. And Perl is the language of choice for
many scripts running on all those Apache servers. But
if you really want to get the most out of Perl and Apache, you need to
embed Perl directly into your server using Apache’s
mod_perl extension.

Much more than a fancy way to quickly invoke a CGI script,
mod_perl essentially embeds a Perl interpreter with
access to nearly the entire Apache API within your Web server. Since
you’re no longer invoking an external Perl interpreter, mod_perl improves the performance of your Perl
scripts, but it also lets you do much more than you could with regular
CGI. When was the last time you found yourself in a CGI program wishing
you had a convenient way of figuring out what MIME type a document was,
or what filename a given URI translated to? Well, with
mod_perl, you can call on Apache’s built-in routines
to solve problems like these with some authority.

And there’s more. Perl code can step in at any of the operation’s
phases: initializing a child immediately after reading the headers,
translating the URI to a filename, parsing the headers, checking
host-based access, checking user credentials, verifying a user against a
certain resource, determining the MIME type, fixing up the headers prior
to a response, delivering the content, logging the request, cleaning up
afterwards, or shutting down a child. By comparison, CGI is limited to
that one phase in the middle: “delivering the content.”

MOD_PERL to the Rescue

One of the first problems I solved with mod_perl was a
custom logging operation. I had been
with an ISP that did not permit easy access to Web server logs, so I had
written a server-side include (SSI) program to write my log information
to a file of my choosing. This SSI-logger then returned an empty string,
causing no output to be included in the page. The information was
written in the same directory as the HTML file, in a filename that could
not be served through the Web server. I merely had to include something
like this in each file I wanted logged:

<!–#include virtual=”/cgi/ssilogger” –>

This worked fine, and I continued to use the SSI-logger even when I
got my own virtual server. But it wasn’t a perfect solution. I had to
remember this special construct and include it in every Web page that I
wanted logged. And before the Web server could serve the page to the
user, these logs all had to be updated, sometimes slowing the response
time. Finally, I couldn’t get logging information on anything but HTML

When I upgraded to include mod_perl in my
server, I saw an opportunity to eliminate the SSI-Logger, and replace it
with a true custom logger. I copied the code for my old SSI-Logger into
the file My/SSILog.pm, and changed surprisingly
little of the code. Then I added the following lines to my top-level
.htaccess file:

PerlRequire /home/merlyn/lib/My/SSILog.pm
PerlLogHandler My::SSILog

From then on, my logging routine was invoked during the logging
phase of each file served from this directory and all its
subdirectories, as shown in Listing One.

Listing One: Log

1 package My::SSILog;
3 ## usage: PerlLogHandler My::SSILog
5 use vars qw($VERSION);
7 use Apache::Constants qw(:common);
9 sub handler {
10 my $r = shift;
11 return OK if $r->content_type =~/^image/;

# don’t log images
13 $r->chdir_file($r->filename);
14 {
15 local *LOG;
16 if (open LOG, “>>.ssilog.txt”) {
17 flock LOG, 2;
18 seek LOG, 0, 2;
19 print LOG join (” “,
20 map “[$_]“,
21 scalar localtime,
22 (map { $_ || “-” }
23 $r->uri,
24 $r->get_remote_host,
25 $r->header_in(“referer”),
26 $r->header_in

27 ),
28 ), “\n”;
29 close LOG;
30 }
31 }
32 return OK;
33 }
35 “true”;

Line 1 puts the data into a package of its own. Since all Perl
programs share the same global namespace in a given mod_perl
server, it’s very important to have
distinct package names.

Line 5 provides a version number variable, which never appears to
get set anywhere else in the program. Oops!

Line 7 pulls in a set of definitions for the most common constants.
In particular, we’re looking for a value for OK, used

Lines 9 through 33 define the handler, which must be called
handler unless you want to go through some extra
hoops. This subroutine will be called at the end of every transaction of
interest. The Apache::Request object is passed
in as the first parameter, which I shift off into $r in line
10. This gives me information about what
just happened.

Since I don’t need any information about images in this specialized
log, line 11 quickly rejects any transaction that was an image. Note
that the transfer will still be logged in the standard logs.

Line 13 changes the working directory of the Web server to the
directory in which the served file is located. Since I want to put the
log file in that same directory, this is a very easy maneuver.

Lines 15 and 16 create a local filehandle, and attempt to open that
handle onto a file called .ssilog.txt in the
current directory. If this fails, we silently skip over the remaining
work. Because this open is executed as the Web-server-user, and not as
me, I need to ensure that either the Web-server-user can write in any
directory I want logged (not a good idea) or that I’ve created an empty
file that can be written by that same user (this is what I generally
do). Other directories are ignored.

Lines 17 and 18 ensure that only one process at a time is writing to
the logfile, and that we’re at the end of the logfile on this next

Lines 19 through 28 construct a line for the logfile, given as the
time of day, the requested URL, the remote host, any referrer if
present, and any user agent, if present. Each of these items is enclosed
in square brackets.

Line 29 closes the file, flushing the buffers and releasing the
flock. This is redundant, since our local filehandle
is aboutto go out of scope in line 31, but I wanted to make sure.

Line 32 returns from this subroutine with an OK value. Line
35 provides a non-zero value to the implicit require operation that brings this module in. And
that’s that.

Of course, even before I had started moving production code to use
mod_perl, I wanted to test a mod_perl server to see if
everything would work
okay. So I set up a separate Web server source tree, and fired it up on
a non-standard port (like 8080). It was cute, but I didn’t have any
substantial content to test it with.

The Shadow Script

At first I thought I’d copy my existing content over from my active
site, but then I realized that this would be silly, since they’re both
really on the same disk. So then I considered simply configuring the new
server to read from the old content tree, but fear of possible
corruptions kept me from doing this. Also, this scheme wouldn’t let me
try new things that overrode old things.

My next idea was to use the nifty mod_rewrite module to
allow a “shadowed” environment. Each incoming URL would be
tested against a small tree associated with the new server. If there was
a match, we’d serve from the new server. Otherwise, the URL would be
repointed at the old tree, and served from there (possibly
getting a 404 error if not found). This set-up was not terribly hard,
but it was a little ugly, as shown in configuration entries in
Listing Two. For details on any of these lines, consult the
mod_ rewrite documentation.

Listing Two: Shadow Scripting
with Mod_Rewrite

1 ## turn on the engine
2 RewriteEngine on
3 RewriteLogLevel 9
4 RewriteLog logs/rewrite_log
6 # local cgi overrides other
7 RewriteCond %{REQUEST_URI} ^/cgi/
8 RewriteCond /home/merlyn/etc/httpd/htdocs%{REQUEST_FILENAME} -f
9 RewriteRule ^ – [PT]
11 # other cgi
12 RewriteRule ^/cgi/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L]
13 RewriteRule ^/cgi-bin/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L]
15 # local htdocs overrides other
16 RewriteCond %{REQUEST_URI} ^/(manual|perl)/
17 RewriteRule ^ – [PT]
19 # other htdocs
20 RewriteRule ^/(.*)$ /WWW/stonehenge/htdocs/$1 [L]

Lines 2 through 4 turn on the rewrite engine for the server, and
establish a (highly verbose) log file.

Line 6 through 9 handle any local CGI programs that should override
CGI programs from the live existing content. Note that I had to hardwire
the test-server’s document root path into the rewrite rule.

Lines 11 through 13 fall back to the live server’s CGI area if
there’s not a local definition in the test server’s area.

In the same way, lines 15 through 20 cause a local
manual or perl prefixed URL to remain in the local
test server tree. Everything else is
then sent over to the live server’s data.

The Power of Perl

But this solution had a few drawbacks. I couldn’t provide a test
document that overrode the live server’s documents, and I had to
hardwire the names of the directories, making it hard to try out more
than one test server.

So, I decided to use the power of a Perl handler during Apache’s
URI-to-Filename translation phase to do the lookups and adjustments.
Everything that can be done with mod_ rewrite
can be done with a proper Perl handler as well. And you can do this
without having to learn Yet Another Language.

Of course, I couldn’t resist adding a few features during the
rewrite of mod_rewrite to mod_perl, as you’ll see in
Listing Three.

Line 4 puts us into the My::Trans package.
Line 6 enables compiler restrictions, to make sure I didn’t
fumble-finger any of the variable names.

Lines 8 through 10 define the path to the live server’s CGI and
document directories. I don’t need to define the test server’s paths in
the same way, because I can simply ask the Apache API where we are.

Lines 12 through 50 define the translation handler, again named
handler . Line 13 grabs theApache::Request object into $r.

Lines 15 through 17 log this request to the server error log. We’ll
make this logging conditional on it being the initial request. If
any handler wants to translate a name to a filename, it’ll make a
subrequest, and we’ll get called again, but we don’t want to log those.
We want only requests from users to be logged.

Lines 19 and 20 get the document root and the requested URI. Line 22
puts the URI into $_ for easy matching and

Lines 24 through 29 detect a CGI script in the test server’s area.
If there is a match, then we’ll set the filename to the local name, and
that will be the document that gets served. At the same time, a log
message is generated. Returning 0 from this handler will terminate the
URI translation phase.

Similarly, lines 30 through 35 handle the rewrites to use any other
CGI program from the live server’s area.

Lines 36 through 41 similarly handle any URLs that begin with manual or perl, forcing
them to be interpreted in the test server’s area.

And lines 42 through 47 deal with all other URLs.

Lines 48 and 49 handle anything that might be left. For example, a
proxy URL would not match anything with a leading slash, so we’ll end up
falling all the way through to here. In this case, I’ll log the
confusion, and return a -1 . This -1 tells Apache that I’ve not handled this request,
and it should try another handler instead. (The effect is identical to
the DECLINED response in the previous

Now I could add the following lines to my configuration files:

PerlRequire /home/merlyn/lib/My/Trans.pm
PerlTransHandler My::Trans

and get a shadow area. Any files in ./htdocs or ./cgi would override the existing
documents and CGI programs, and I could add Apache::Registry programs into ./perl, as well as serve the provided manual
information directory from ./manual.

If you would like to have more information about mod_perl, check out the comprehensive Web site at
http://www.perl.apache.org. Also, Doug MacEachern (the architect and
chief implementor of the mod_perl project) has
co-authored a book with CGI guru Lincoln Stein, published by O’Reilly
& Associates, called Writing Apache Modules with Perl and C.
There are several ample chapters available from this well-written book.
Check out http:// www.modperl.com.

I’m especially encouraged by how easy it is to add functionality to
my Web server with mod_perl. Give it a try. You
just might draw the same conclusion.

Until next time, Enjoy!

Listing Three: Translating Mod_Rewrite to Mod_Perl

1 ## install as
2 ## PerlTransHander My::Trans
4 package My::Trans;
6 use strict;
8 my $other = “/WWW/stonehenge”;
9 my $other_cgi = “$other/cgi-bin”;
10 my $other_root = “$other/htdocs”;
12 sub handler {
13 my $r = shift;
15 if ($r->is_initial_req) {
16 $r->warn(“request: “.$r->
17 }
19 my $document_root = $r->document_root;
20 my $uri = $r->uri;
22 local $_ = $uri;
24 ## local /cgi/
25 if (m{^/cgi/} and -x

“$document_root$_”) {
26 $r->warn(“$uri => using local CGI

at $document_root$_”);
27 $r->filename(“$document_root$_”);
28 return 0;
29 }
30 ## old /cgi/ or /cgi-bin/
31 if (s{^/(cgi|cgi-bin)/}{$other_cgi/}) {
32 $r->warn(“$uri => using remote CGI

at $_”);
33 $r->filename($_);
34 return 0;
35 }
36 ## local /manual/ or /perl/
37 if (m{^/(manual|perl)(/|$)}) {
38 $r->warn(“$uri => using local file

at $document_root$_”);
39 $r->filename(“$document_root$_”);
40 return 0;
41 }
42 ## any old prior
43 if (s{^/}{$other_root/}) {
44 $r->warn(“$uri => using remote file

at $_”);
45 $r->filename($_);
46 return 0;
47 }
48 $r->warn(“$uri => huh?”);
49 return -1;
50 }
52 1;

Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting
and co-author of
Learning Perl and Programming Perl. He
can be reached at merlyn@stonehenge.com.

Comments are closed.