dcsimg

Caching Proxy Servers with Template Toolkit

In the previous three articles, I introduced my templating system of choice, the Template Toolkit (TT). Since those articles were intended as overviews, I didn't have much space to go into meaty examples. So, in this article, I'll look at how I'm using TT every day to help me manage the Stonehenge Consulting web site (http://www.stonehenge.com).

In the previous three articles, I introduced my templating system of choice, the Template Toolkit (TT). Since those articles were intended as overviews, I didn’t have much space to go into meaty examples. So, in this article, I’ll look at how I’m using TT every day to help me manage the Stonehenge Consulting web site (http://www.stonehenge.com).

In the July 2002 edition of this column (available online at http://www.linux-mag.com/2002-07/perl_01.html), I described in detail how I had placed http://www.stonehenge.com under CVS management and how I had chosen to use TT to manage the slight variations required for the files between development versions and production versions. But I glossed over (for lack of room) how I use TT to manage the variations between the front-end, caching-reverse-proxy server and the back-end, heavy Apache mod_perl server, including how to make it easy to have many virtual servers with similar configurations. Let’s look at that now.

A mod_perl-enabled application server generally caches many Perl subroutines and data structures in memory, trading that memory for the delay of reloading such structures from disk or recomputing them time and again. In my case, a typical mod_perl process is about 20 to 30 megabytes of memory. In the “old days,” I let these fat 20 MB processes take care of every request from a browser, including requests for images or static HTML files that really didn’t need mod_perl involved. Besides just wasting resources, this gets particularly nasty when large images are being downloaded over a slow link: the fat process gets tied up for a number of seconds, not milliseconds.

Modern, best practices suggest that requests that must be handled by mod_perl be somehow separated from those that aren’t. One popular strategy is to use a caching-reverse-proxy server. In that model, the incoming requests are handed to a thin proxy server. Some people use Squid for this (for more information about Squid, see August 2003′s “LAMP Post,” available online at http://www.linux-mag.com/2003-08/lamp_01.html), but I like using Apache for this to keep everything consistent. The proxy server sits in front of the real mod_perl server, and caches any results from the server if possible. Thus, the first hit might be a bit slow and wasteful, but every subsequent hit to the cached item is much faster, because the back-end mod_perl server isn’t even consulted (or is consulted only to verify that the cache is up-to-date). And, if the proxy servers have a low memory footprint (I’ve been keeping mine around 1 to 2 megabytes per process), I can run many of them to reduce delivery latency.

Even fully dynamic pages benefit, because the data is squirted quickly from the back-end server to the proxy server, freeing up the back-end process to handle a different request. The proxy server can then dole out the page over the network. This makes a big difference with large dynamic pages being handed out over a slow network link, such as to a dialup user or over a satellite connection.

Additionally, some requests can be diverted by the proxy server to be served directly from local files. If the proxy server and back-end server are on the same machine, this is trivial. I simply set the DocumentRoot for both servers be the same directory, and let the request be satisfied directly instead of proxied.

But getting this proxy and back-end configuration correct, including multiple virtual hosts, can be a real pain, especially when you consider that we have to make this work in a development environment as well. Thankfully, Template Toolkit lets me write the configuration once and reuse it in slightly different variations as needed.

My first convenient tool to manage this process is what I call my “PB&J filter” (the obvious reference to peanut butter and jelly is intentional). I have a simple filter wrapper like so:


[* BLOCK pbj_filter;
# filter lines marked beginning with #PBJ#
FOR line = content.split("\n");
IF kind == "P";
line = line.replace('^#\w*P\w*#\s*','');
ELSIF kind == "J";
line = line.replace('^#\w*J\w*#\s*','');
ELSE; # kind == "B";
line = line.replace('^#\w*B\w*#\s*','');
END;
line; "\n";
END;
END; # BLOCK pbj_filter
*]

The variable kind is set (using code not shown here) to one of three values by examining various environment variables. If kind is P, the server is the front proxy server; if it’s B, it’s the back-end server; and if J, it’s a joined, single-server version (sometimes used for testing, but not run my production servers). I use the filter shown in Listing One to load the appropriate modules in the appropriate way on each kind of server.




Listing One: Processing a series of modules with the “PB&J” filter

[* MACRO module(mod_name_c, name_module, mod_name_so) BLOCK;
IF env.MODULES_INTERNAL.search(mod_name_c);
"AddModule "; mod_name_c;
ELSE;
"LoadModule "; name_module; " "; env.MODULES; "/"; mod_name_so;
END;
END;
*]

[* WRAPPER pbj_filter *]
# [* module('mod_vhost_alias.c', 'vhost_alias_module', 'mod_vhost_alias.so') *]
# [* module('mod_env.c', 'env_module', 'mod_env.so') *]
[* module('mod_log_config.c', 'log_config_module', 'mod_log_config.so') *]
# [* module('mod_log_agent.c', 'log_agent_module', 'mod_log_agent.so') *]
# [* module('mod_log_referer.c', 'log_referer_module', 'mod_log_referer.so') *]
# [* module('mod_mime_magic.c', 'mime_magic_module', 'mod_mime_magic.so') *]
[* module('mod_mime.c', 'mime_module', 'mod_mime.so') *]#BJ# [* module('mod_negotiation.c',
'negotiation_module', 'mod_negotiation.so') *]
[* module('mod_status.c', 'status_module', 'mod_status.so') *]
[* module('mod_info.c', 'info_module', 'mod_info.so') *]
#BJ# [* module('mod_include.c', 'include_module', 'mod_include.so') *]
#BJ# [* module('mod_autoindex.c', 'autoindex_module', 'mod_autoindex.so') *]
#BJ# [* module('mod_dir.c', 'dir_module', 'mod_dir.so') *]
#BJ# [* module('mod_cgi.c', 'cgi_module', 'mod_cgi.so') *]
[* module('mod_speling.c', 'speling_module', 'mod_speling.so') *]
# [* module('mod_userdir.c', 'userdir_module', 'mod_userdir.so') *]
#BJ# [* module('mod_alias.c', 'alias_module', 'mod_alias.so') *]
[* module('mod_rewrite.c', 'rewrite_module', 'mod_rewrite.so') *]
#BJ# [* module('mod_access.c', 'access_module', 'mod_access.so') *]
#BJ# [* module('mod_auth.c', 'auth_module', 'mod_auth.so') *]
# [* module('mod_auth_anon.c', 'auth_anon_module', 'mod_auth_anon.so') *]
# [* module('mod_auth_dbm.c', 'auth_dbm_module', 'mod_auth_dbm.so') *]
# [* module('mod_auth_db.c', 'auth_db_module', 'mod_auth_db.so') *]
#P# [* module('mod_proxy.c', 'proxy_module', 'libproxy.so') *]
#BJ# [* module('mod_expires.c', 'expires_module', 'mod_expires.so') *]
# [* module('mod_headers.c', 'headers_module', 'mod_headers.so') *]
# [* module('mod_usertrack.c', 'usertrack_module', 'mod_usertrack.so') *]
#PJ# [* module('mod_setenvif.c', 'setenvif_module', 'mod_setenvif.so') *]
#PJ# [* module('mod_ssl.c', 'ssl_module', 'libssl.so') *]
#BJ# [* module('mod_perl.c', 'perl_module', 'libperl.so') *]
[* END *]

In Listing One, I’m defining the modules to be used in my front-end and back-end server. Any line that begins with a normal comment, such as the one for mod_speling, remains commented for all versions. Any line that doesn’t begin with a comment is live in all versions, such as mod_rewrite.

But any line that begins with a hash mark, one or more alphabetic letters, and another hash mark, is essentially a conditional comment that will be uncommented on that particular variation of the file.

Thus, mod_perl is enabled in back-end and single-server variations, while mod_ssl is enabled only in proxy and single-server variations.

The resulting last few lines for my back-end server look like…


#P# AddModule mod_proxy.c
# AddModule mod_cern_meta.c
AddModule mod_expires.c
# AddModule mod_headers.c
# AddModule mod_usertrack.c
# AddModule mod_unique_id.c
#PJ# AddModule mod_setenvif.c
#PJ# AddModule mod_ssl.c
AddModule mod_perl.c

… and the configuration for a front, proxy server looks like this:


AddModule mod_proxy.c
# AddModule mod_cern_meta.c
#BJ# AddModule mod_expires.c
# AddModule mod_headers.c
# AddModule mod_usertrack.c
# AddModule mod_unique_id.c
AddModule mod_setenvif.c
AddModule mod_ssl.c
#BJ# AddModule mod_perl.c

This is a powerful way of expressing exactly how my front- and back-end servers are similar and yet different. TT helps me tremendously here, keeping me from having to maintain two separate files and trying to keep them in sync.

But the real gain comes around when I get to a virtual server. For the proxy server, I use mod_rewrite to decide whether to serve it locally, proxy it to the back-end, or just forbid it. In the proxy server for http://www.stonehenge.com, I end up with Listing Two.




Listing Two: Rewrite rules for the Stonehenge proxy server

RewriteEngine On
RewriteRule ^/icons/ – [last]
RewriteRule ^/tt2/images/ – [last]
RewriteMap escape int:escape
RewriteRule ^/(.*)$ http://127.0.0.1:8081/${escape:$1} [proxy,noescape]
ProxyPassReverse / http://127.0.0.1:8001/

This configuration causes /icons and /tt2/images to be served directly in the proxy server (from a shared Document Root). All other requests are properly repackaged as a proxy request to 127.0.0.1:8081, which is where the “real” http://www.stonehenge.com server lives, including mod_perl support. The result is then cached if possible (has a last-modified date or an expires date) and returned to the incoming client request. In the back-end server, these lines are absent for this virtual host.

Every virtual host is built with a simple invocation to a make_virtual_server wrapper, like so:


[* WRAPPER make_virtual_server
name = "www.stonehenge.com" develport = 8001
host = "www.stonehenge.com" port = 80
backhost = "127.0.0.1" backport = 8081;
WRAPPER pbj_filter *]
#PJ# ## block bad robots like evil sitescooper
#PJ# RewriteCond %{HTTP_USER_AGENT}
^sitescooper
#PJ# RewriteRule ^ – [forbidden]
#PJ# ## local services:
#PJ# RewriteRule ^/icons/ – [last]
#PJ# RewriteRule ^/tt2/images/ – [last]
#BJ# <IfModule mod_perl.c
#BJ# <Perl
#BJ# use lib “[* env.PREFIX *]/perl-lib”;
#BJ# do “startup.pl”;
#BJ# </Perl
#BJ# </IfModule
#BJ# ErrorDocument 404 /404.html
[* END; END *]

The parameters at the top control the common Hostname, Listen, and mod_rewrite lines added to proxy servers. The pbj_filter material provides additional per-virtual-host, specific additions. With this in place, adding the virtual server for http://www.geekcruises.com was as simple as adding a few more lines to the configuration:


[* WRAPPER make_virtual_server
name = "www.geekcruises.com" develport = 8005
alias = "geekcruises.com
geekcruises.stonehenge.com"
host = "geekcruises.stonehenge.com" port = 80
backhost = "127.0.0.1" backport = 8083;
WRAPPER pbj_filter *]
DocumentRoot [* env.GEEKCRUISES_ROOT *]
#BJ# <IfModule mod_perl.c
#BJ# <Perl
#BJ# use lib “[* env.PREFIX *]/perl-lib”;
#BJ# </Perl
#BJ# PerlPostReadRequestHandler Stonehenge::MyPostReadRequest
#BJ# </IfModule
[* END; END *]

Thus, http://www.geekcruises.com can use the same proxy caching servers and mod_perl back-end servers as http://www.stonehenge.com with minimal effort.

While I don’t have room to include the full definition of make_virtual_server here, it was really just a matter of figuring out how to plug in the values given as parameters.

For example, two lines from the mod_rewrite section look like:


RewriteRule ^/(.*)$ http://
[* "$backhost:$backport" *]/${escape:$1}
[proxy,noescape]
ProxyPassReverse / http://[* name *]:
[* port *]/

And once that information is captured in a template, I can reuse the pattern again and again for each virtual server.

I hope this description has inspired you to consider an alternate means of maintaining those “similar but different” files, using the powerful Template Toolkit as your tool of choice.

Until next time, enjoy!



Randal Schwartz is the chief Perl guru at Stonehenge Consulting and the author of many books on Perl. He can be reached at merlyn@stonehenge.com.

Comments are closed.