dcsimg

Template-driven File Management

I recently decided to put the stonehenge.com Web site under CVS (Concurrent Versions System) management. With the CVS tools, I can "check out" a current version of the Web site sources, modify it as necessary, test it on a development server, and then "check in" the changes for deployment on my live server -- the same way the big boys do it. I can also let other Stonehenge druids edit portions of the site, a task that had been exclusively my job (along with the dozens of other self-appointed roles I fill at Stonehenge).

I recently decided to put the stonehenge.com Web site under CVS (Concurrent Versions System) management. With the CVS tools, I can “check out” a current version of the Web site sources, modify it as necessary, test it on a development server, and then “check in” the changes for deployment on my live server — the same way the big boys do it. I can also let other Stonehenge druids edit portions of the site, a task that had been exclusively my job (along with the dozens of other self-appointed roles I fill at Stonehenge).

Since some of the Apache configuration files contain hard-coded pathnames, I couldn’t just use the version on the test server, since that would point to the wrong place. I pondered a lot of solutions, starting by spending a few days rewriting all the config files so that they used only names relative to the config directory. Unfortunately, I got stuck on one directive (for mod_proxy‘s cache configuration) that does not permit a relative name. So much for that idea.

After giving up on the relative pathnames idea, it occurred to me to run the source files through a substitution process that could plug in the pathnames and perhaps a few changeable configuration values. Since much of my site’s new design will be processed dynamically using Andy Wardley’s Template Toolkit (http://template-toolkit.org), I decided to use it at “build time” as well.

The Template distribution’s ttree utility, at first glance, seemed to do what I wanted: take a tree of files and process them into a target tree, updating only the files that had changed. But I needed similar structures to process files that either weren’t templated, or were derived from multiple source files. Since those requirements are outside ttree‘s design, I used the important pieces from ttree‘s source code to make my template processing engine (shown in Listing One).




Listing One: run-template


1 #!/usr/bin/perl -w
2 use strict;
3 $|++;
4 use Template;
5 use Getopt::Long;
6
7 GetOptions(
8 ‘preprocess’ => \ (my $preprocess),
9 ‘force!’ => \ (my $force = 0),
10 ) or die “see code for usage\n”;
11
12 my $t = Template->new
13 ({
14 RELATIVE => 1,
15 PRE_PROCESS => $preprocess,
16 INCLUDE_PATH => ['.'],
17 TAG_STYLE => ‘star’,
18 });
19
20 my %seen;
21 while (<>) {
22 next if $seen{$_}++;
23 my($outname, $inname) = split;
24 my @instat = stat($inname) or
25 print(” – $inname (can’t stat)\n”), next;
26
27 unless ($force) {
28 my @outstat = stat($outname);
29 @outstat and $outstat[9] > $instat[9] and
30 print(” – $inname (not newer)\n”), next;
31 }
32
33 $t->process($inname, {env => \%ENV}, $outname) or
34 print(” ! “, $t->error(), “\n”), next;
35 print(” + $inname => $outname\n”);
36
37 chown $instat[4], $instat[5], $outname
38 or warn “Cannot chown @instat[4,5] $outname: $!”;
39 chmod $instat[2], $outname
40 or warn “Cannot chmod $instat[2] $outname: $!”;
41 }

Calling the Template Engine

I decided to drive my templating engine from a control file, typically read from STDIN, consisting of an output and input filename per line, separated by whitespace. (The first time I have a filename with embedded whitespace, I’ll have to rewrite this bit.) To create the control file, I used a GNU-style Makefile, which we’ll see later. But first, let’s focus on the templating engine.

Lines 1 through 3 start nearly every Perl program I write, enabling compile and runtime warnings, restricting the use of barewords, soft references, and undeclared variables, and ensuring that STDOUT is unbuffered.

Lines 4 and 5 pull in the Template and Getopt::Long modules, found in the CPAN.

Lines 7 through 10 process the two command-line options: a flag that provides a pre-processing hook file for Template, and an option to force processing regardless of timestamps. Because my Makefile will want to have authority about processing a particular file, I’ll be using the –force option when calling the template engine from my Makefile. As yet, I haven’t used the preprocessing flag.

Lines 12 through 18 set up the Template object, including the configuration options needed for my operation. Relative pathnames are needed to permit the Makefile to specify filenames below the current directory (relative to the include path). The preprocess template is given by the value associated with the option, or undef, meaning no preprocess template. And finally, I decided to use the star tag style in the “build phase” to distinguish it from the normal Template style to be executed at page delivery time. This permits template instructions like:


[* IF env.ENABLE_JOKES -*]
[% PROCESS stonehenge/sidebar/jokes %]
[*- END *]

If the environment variable ENABLE_JOKES is set (while building the site), then the directive is included to process the sidebar at page delivery time. (The env hash is a Template variable; we’ll see this in a moment.)

Lines 21 to 41 form the main processing loop. To prevent duplicate consideration of a particular templated file, line 20 defines a %seen hash, containing the lines we’ve processed so far as keys. Sometimes during my testing, I’d update a templated file, but the template processing would fail. The next make run would again add the template to the list of things out of date, and this template engine would end up seeing the item twice.

Line 23 extracts the output filename and the input filename. Lines 24 and 25 ensure that the input exists, and grab the stat information to use later (for the modification time, permissions, and ownership).

Lines 27 to 31 allow the template engine to be a “mini-make.” Unless the –force option is given on the command line, the input file will not be processed (and thus replace the existing output file) unless it’s newer than the existing output file. You could use the template engine with a static list of source/destination pairs this way, and the engine would perform minimal work to update the files. However, since we’re letting make determine out-of-date files, we’ll be skipping this code.

If we make it to line 33, it’s time to run the template. The call to the process() method of the Template object does the job. The middle parameter defines the predefined variables available to the individual templates. In this case, we’re passing the environment variables as the name env. Individual environment variable names are available as env.PATH or env.SHELL, and so on. This is the primary way the Makefile can parameterize the templates, including overriding the values for a particular build.

If the processing fails, line 34 displays that, along with the Template error message. On success, the processing is noted in line 35.

Lines 37 to 40 copy the ownership and permissions from the source file to the destination file. Failures are noted as an advisory, although execution continues.

So that’s the template processor. When executed, it looks for lines on standard input like the following and processes the configuration file from a relative-path-named local source file.


/web/stonehenge/etc/httpd.conf \
etc/httpd.conf.tmpl

The httpd.conf.tmpl file contains mostly constant text, except for things that vary based on the installation directory or other local parameters. Those are replaced en route to the httpd.conf file. See Figure One for an example.




Figure One: Sample httpd.conf.tmpl file


ServerName [* env.SERVERNAME *]
Listen [* env.LISTEN_AT *]
DocumentRoot [* env.PREFIX *]/htdocs
PIDFile [* env.PREFIX *]/var/run/httpd.pid
ScoreBoardFile [* env.PREFIX
*]/var/run/httpd.scoreboard
LockFile [* env.PREFIX *]/var/run/httpd.lock
<Directory [* env.PREFIX *]/htdocs>
….
</Directory>

Additionally, repetitive or conditional items can be captured as Template blocks or macros. (I’m just now scratching the surface of this. Perhaps I’ll cover it in greater detail in a future column.)

Building the Template Control File

Of course, this doesn’t work without the Makefile placing the right items into the control file, or making the target directories and copying all the other files over. So, let’s take a look at Listing Two to see how that’s done.




Listing Two: Makefile


1 SHELL = /bin/sh
2 .SUFFIXES:
3
4 ifndef RECURSED
5 MAKECMDGOALS ?= install
6
7 $(MAKECMDGOALS):
8 @$(MAKE) –no-print-directory RECURSED=1 $(MAKECMDGOALS) FINAL
9 else

10 export PREFIX ?= /web/stonehenge
11 export INSTALLPREFIX ?= $(PREFIX)
12 export APACHE_PREFIX ?= /opt/apache/1.3.23
13 export SERVERNAME ?= www.stonehenge.com
14 export LISTEN_AT ?= www.stonehenge.com:80
15
16 I = $(INSTALLPREFIX)
17 TEMPLATER = ./run-template
18
19 get_installs_from_subdir = $(patsubst %,$I/%, $(patsubst %.tmpl,%,$(shell find $1 -type f ! -name ‘*~’ -print)))
20
21 ### subdirectories
22 ## etc
23 install: install-etc
24 install_etc_files := $(call get_installs_from_subdir, etc)
25 install-etc: $(install_etc_files)
26 ## htdocs
27 install: install-htdocs
28 install_htdocs_files := $(call get_installs_from_subdir, htdocs)
29 install-htdocs: $(install_htdocs_files)
30 ## sbin
31 install: install-sbin
32 install_sbin_files := $(call get_installs_from_subdir, sbin)
33 install-sbin: $(install_sbin_files)
34 ## var
35 install: install-var
36 install_var_files := $(call get_installs_from_subdir, var)
37 install-var: $(install_var_files)
38
39 ### pattern rules
40 $I/%: %
41 mkdir -p $(dir $@)
42 cp $< $@
43
44 $I/%: %.tmpl $(TEMPLATER) GNUmakefile
45 @echo want: $< ‘=>’ $@
46 @echo $@ $< >>$(TEMPLATER).in
47
48 FINAL: $(TEMPLATER).out
49
50 $(TEMPLATER).out: $(TEMPLATER).in
51 $(TEMPLATER) –force $<
52 -@cp /dev/null $<
53 -@touch $@
54 $(TEMPLATER).in:; touch $@
55
56 endif # matches ifdef/else at top of file

The trickiest part of the Makefile design was ensuring that the template engine would get run once at the end of the pass. In BSD Make, this can be achieved with a .END target, but GNU Make didn’t have such a feature. With the help of fellow Perl hacker Uri Guttman, I came up with a weird hack that’s rather cool once you get your head around it.

What we did was to split the entire Makefile into two pieces. On a normal invocation, only the first few lines (from line 5 to line 8) are executed. This recursively invokes make, using the same Makefile and attempting to build the same targets, but adds FINAL to the list of targets that were previously specified. We also define a special variable (RECURSED), whose definition is caught by the conditional on line 4, skipping us over lines 5 through 8. We then run the rest of the file.

Lines 10 to 14 define the configuration parameters used by the templates, and by the Makefile itself.

PREFIX is the execution top-level directory. INSTALLPREFIX is usually the same as PREFIX, except when you want to tar up the files for an RPM or other distribution bundler, or want to “stage” the live data. For example, if your live site is running off /web/stonehenge, you can build and install a new Web site from scratch with minimal downtime using:


$ make INSTALLPREFIX=/web/NEW
$ /web/stonehenge/sbin/apachectl stop
$ mv /web/stonehenge /web/stonehenge.OLD
$ mv /web/NEW /web/stonehenge
$ /web/stonehenge/sbin/apachectl start

By making INSTALLPREFIX separate from PREFIX, we can stage the files in that temporary directory before we perform the switch.

APACHE_PREFIX defines the prefix that Apache was built with. Apache’s etc and sbin directories should be immediately below this directory name.

SERVERNAME and LISTEN_AT define the server information. Again, the point of this configuration is to be able to run a development version of the server at a different location, perhaps even on a different box, so these must be configurable.

Lines 16 and 17 define variables that should not be overridden from the command line. In particular, note the use of I, which permits $I to be written in rules instead of the longer $INSTALLPREFIX.

Line 19 defines a macro that crawls through a given subdirectory (which will be under $PREFIX), looking for any files (that aren’t Emacs editor backups), and rewrites their equivalent paths so they appear to be in the INSTALLPREFIX hierarchy. If a file ends in .tmpl, that suffix is removed. This macro means we don’t have to specify all the files in the various subdirectories; they will be found automatically.

Lines 21 to 37 define the rules for building each of the subdirectories, including a group install target to build just a portion of the data. There’s a lot of repetition, but I couldn’t find a way to reduce it. Note that the pattern is similar: the top-level install target depends on a particular install-foo target. A similarly-named variable is loaded by calling the macro defined earlier, and then the install-foo target is made to depend on those filenames.

But where do the rules get selected to either copy those files or run them through the templating engine? That magic happens in lines 39 to 46.

If a file wanted under the INSTALLPREFIX directory has a corresponding file relative to the local current directory, then we simply copy it over (if it’s out of date), after first making its parent directory (if needed).

However, if the file wanted in the INSTALLPREFIX directory has a corresponding .tmpl file, we record that (for the template engine to process later) by writing the destination and source into the run-template.in file. Note that all template files are also dependent on the templater itself, and the GNUmakefile. That way, edits to either of these files cause the templates to be re-run.

So the typical install step copies text files directly from the source directories to the destination, and notes the template files that also have to be processed. But where do the templates get processed? Recall that the recursive invocation also wants FINAL to be built, after building the designated targets. Lines 48 to 54 define the rules for that.

First, FINAL depends on run-template.out, so we need to bring that up to date. It’s up to date only when it’s newer than run-template.in. But if it doesn’t exist, or is not newer, we’ll run the commands in lines 51 to 53. The templater processes the control file (written into by line 46), then empties it out (line 52), and the output file is then touched (in line 53) to make it newer than the input. If, for some reason the input file was never created, an empty one is created in line 54. I’m not sure if I still need this step: but it certainly didn’t hurt to leave it in.

And that’s it. In these two core structures, you’ve got the means to build a hierarchy of files, some of which are run through a templating engine, with a minimal amount of copying as you edit stuff. And that’s the guts of my new Web site building engine. Until next time, enjoy!



Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and can be reached at merlyn@stonehenge.com. Code listings for this column can be found at http://www.stonehenge.com/merlyn/LinuxMag/.

Comments are closed.