Tieing Up Loose Ends

Perl has a lot of cool stuff. Certainly, the basic: print "Hello, world!\n"; gets people started without knowing much about the language, but the question "Is there a way to do (X) in Perl?" can usually be answered "Yes!"

Perl has a lot of cool stuff. Certainly, the basic: print “Hello, world!\n”; gets people started without knowing much about the language, but the question “Is there a way to do (X) in Perl?” can usually be answered “Yes!”

For example, the neat way that a DBM can appear to be a hash in Perl rather transparently is done with a mechanism called “tied variables”. Buttied variables aren’t limited to DBMs — we can make scalars, arrays, hashes, and filehandles all have similar magic.

What? Filehandles? Yes. Imagine a “magic” filehandle that appears to the rest of the program to be a normal filehandle (albeit already opened). But, every time the program “reads a line” from the filehandle, a subroutine gets invoked; and, for every operation on this so-called filehandle, a different subroutine gets invoked. Well, that’s what a tied filehandle does.

One use of having a magic filehandle is to create a filehandle that automatically expands “include” specifications, where some part of the contents indicate that other files must be consulted as well. For example, Perl’s requireoperator brings in additional Perl code from other files, and the C preprocessor (CPP) looks for lines like #include “file.h” to bring in more C code.

The advantage of the filehandle having all the smarts is that we can re-use existing code or libraries that expect a filehandle, and yet get the include-file expansion done transparently.

Listing One: ihtest

1 #!/usr/bin/perl -w
2 use strict;
4 use IncludeHandle;
6 {
7 local *FRED;
8 tie *FRED, ‘IncludeHandle’, “localfile”,
q/^#include “(.*)”/
9 or die “Cannot tie: $!”;
11 while (<FRED>) {
12 print;
13 }
14 }
16 {
17 local *BARNEY;
18 tie *BARNEY, ‘IncludeHandle’, “localfile”, qr/^#include “(.*)”/
19 or die “Cannot tie: $!”;
21 my @a = <BARNEY>;
22 print @a;
24 }
26 {
27 local *DINO;
28 tie*DINO,’IncludeHandle’,”localfile”,sub{
29 /^#include \”(.*)\”/ ? $1 :
/^#include <(.*)>/ ? $1 : undef
30 }
31 or die “Cannot tie: $!”;
33 print <DINO>;
34 }

While getting a filehandle to be tied may seem difficult, the process is actually rather straightforward. Just create a class library (IncludeHandle, for example), and then create handles with tie rather than open. To demonstrate this, I’ve written a program that uses the IncludeHandle class. [See Listing One below].

Lines 1 and 2 turn on warnings, and enable compiler restrictions.

Line 4 pulls in the IncludeHandlemodule, described later. Because this is an object-oriented class, we won’t be importing any functions.

Lines 6 through 14 demonstrate the first use of the tied IncludeHandle generated handles. I’m setting up a “naked block” (a block that is not otherwise part of a larger construct, like an if or a while), so that the local on the soon-to-be tied (or is that fit to be tied?) filehandlefound in *FRED will disappear when I’m done.

The local *FRED in line 7 creates a temporary value for all kinds of things that share the name FRED. One of these is our filehandle, and although the others (like $FRED and %FRED) are also localized, that doesn’t make much difference. This temporary value will get undone in line 14.

Lines 8 and 9 tie the filehandle FRED (indicated by passing the symbol name *FRED to tie), using the designated parameters. The first parameter must be a class name (a package name with certain subroutines defined within that package). Here, I’ve designated the IncludeHandle class to handle the tie. The parameters of localfile and a quoted string that looks like a C-language preprocessor include-file directive get passed to the TIEHANDLE method, described later. If this succeeds, the tie returns true; otherwise, the die is executed with $! having an appropriate brief error code.

I’ve defined the first parameter after the classname to be treated as a filename to open. You can think of this as if it were:

open(FRED, “localfile”) or die “Cannot open: $!”;

except that any include files (denoted by lines that match the second additional parameter after the classname) will be expanded in place. Thus, the normal-looking loop in lines 11 through 13 will dump out the contents of this file. If any line of localfile matches the pattern ^#include”(.*)”, however, the part returned as $1 in that pattern will be opened as a new file, and its contents will be inserted in place of the line. This is a recursive operation: included files may themselves contain include-file lines. We’ll see how this all works later when I describe the class file.

Lines 16 to 24 show a similar example. Note, however, that the include-file pattern specification is being passed as a compiled regular expression, rather than just a string. That’s helpful if the tie is being executed in a loop, so that the expression doesn’t have to continue to be recompiled on each iteration. Again, I’m just showing off the versatility of this particular tie usage.

Also note here that lines 21 and 22 invoke the filehandle read-line operator in a list context instead of a scalar context. We’ll see later how this is supported later.

Lines 26 to 34 show a more interesting and complicated use of that same second parameter. If the parameter is a “coderef” (a reference to a named or anonymous subroutine), then the subroutine is called for each line read from the file, with $_ set to the line. The subroutine can return undef to indicate that the line is an ordinary text line to be returned as part of the read operation, or can return a string indicating a new filename to open.

Lines 28 through 30 define an anonymous subroutine that looks for two different kinds of include lines –both kinds that the C-preprocessor understands. If we wanted, we could even create a “search path” for names found within angle brackets, just like the C-preprocessor. Again, note in line 33 that we’re invoking the read-line operator in a list context, here being passed directly back to print.

To test this, we can create a local filelocalfile that might look like this:

#include “incfile
#include <incfile>

and then another file incfile that contains this:


and the output will look like Figure 1.

Note that the line with the angle-bracketed name is processed only on the third include, because the first two used a simple, regular expression that did not include the angle-bracketed form.

So, how does all this magic happen? How can the filehandles created with tie automatically include other files while the reading is taking place? To understand that, let’s look at the use of the IncludeHandle class. That’ll be in the file named IncludeHandle.pm. [See Listing Two, page 61].

Figure 1

#include <incfile>
#include <incfile>

Note the copyright in lines 1 and 2 — just to make it clear that you can steal this if you want, although really it’s more of a demo than a complete module. For one, it doesn’t have embedded documentation, because you’re reading the documentation right now.

Line 4 switches us into the IncludeHandle package, so that unqualified symbols end up in the right place. Line 5 turns on the standard compiler restrictions.

Lines 7 and 8 pull in two needed modules. First, we’ll need dynamically created filehandles, so the IO::File module takes care of that for us. Further, some error messages must look as if they came from the invoker rather than one of these routines, so I’ll also use the carp function for that. Both of these modules are in the standard Perl distribution, so there’s no need to pull anything from the CPAN.

Lines 10 through 29 define the TIEHANDLE method. This method name is built into the tie interface, so you can’t just make up a name and expect it to work. The parameters are copied from the ones to tie (skipping over the first tie parameter).

Lines 11 through 14 name those parameters. The class name (in this case, IncludeHandle) ends up in $class, while the requested filename and include specification end up in $file and $code, respectively.

Lines 16 and 17 open up the initial file. If this fails, we’re not going to get very far, so a quick undef return is enough to tell the tie operator that things broke, and that’s also going to return an undef to the invoker of tie, triggering a die in the earlier code.

Lines 19 through 23 turn the include specification into a coderef if it isn’t already. First, the pattern is compiled (harmless if it was already a compiled pattern) in line 20. If that fails, we’ll return undef (failing the tie operation), but first we’ll set $! to illegal seek, meaning we can’t seek for include information given thatbad regular expression. Twisted, but I had to pick one of the existing errno codes, so my choices were limited. Obviously, the docs for a production module similar to this would describe that error and why you would get it.

Line 22 compiles an anonymous subroutine that has a closure on the $pattern lexical variable. This subroutine does a simple pattern match on $_, returning either $1 if successful, or undef. Thus, after this step, $code is always a coderef fitting the general specification described earlier.

Finally, and very essential, lines 25 to 28 return a blessed hashref, becoming the object that sits behind the tie. As further operations are performed on the tied filehandle, they’ll be translated into method calls on this object. Here, I’m saving the opened filehandle, and the coderef. The opened filehandle is dropped into a single-element anonymous array, for reasons that will become apparent later. The object is returned from the tie operator, but is also saved for these automatic operations. You can always get it back by calling the tied operator on the potentially tied item.

Listing Two: IncludeHandle.pm

1 ## copyright (c) 1999 Randal L. Schwartz
2 ## you may use this software under the same terms as Perl itself
4 package IncludeHandle;
5 use strict;
7 use IO::File;
8 use Carp qw(carp);
10 sub TIEHANDLE {
11 my $class = shift;
13 my $file = shift;
14 my $code = shift; # might be string
pattern, or qr//
16 my $handle = IO::File->new($file, “r”)
17 or return undef; # also sets $!
19 unless ((ref $code || “”) eq “CODE”) {
20 my $pattern = eval { qr/$code/ };
21 $! = 29, return undef if $@; # bad RE
22 $code = sub { $_ =~ $pattern ? $1 : undef };
23 }
25 bless {
26 Handles => [$handle],
27 Code => $code,
28 }, $class;
29 }
31 sub READLINE {
32 my $self = shift;
34 if (wantarray) {
35 my @return;
36 while (defined(my $line = $self-> read_a_line)) {
37 push @return, $line;
38 }
39 @return;
40 } else {
41 $self->read_a_line;
42 }
43 }
45 sub read_a_line {
46 my $self = shift;
48 my $handles = $self->{Handles};
49 {
50 return undef unless @$handles;
51 my $handle = $handles->[0];
52 my $result = <$handle>;
53 unless (defined $result) {
54 shift @$handles;
55 redo;
56 }
57 my $filename = do {
58 local $_ = $result;
59 $self->{Code}->();
60 };
61 if (defined $filename) {# saw an include
62 if (my $include_handle = IO::File-> new($filename, “r”)) {
63 unshift @$handles, $include_handle;
64 } else {
65 carp “Cannot open $filename (skipping): $!”;
66 }
67 redo;
68 }
69 $result;
70 }
71 }
73 “0 but true”;

Speaking of operations on the tied filehandle, the most interesting one for our experiment is “reading a line”, which translates into a READLINE method call on our hidden object. This method is defined as the subroutine in lines 31 through 43. Note that this operation can happen in either a scalar or an array context: In a scalar context, we’ll fetch one line (performed in line 41 by calling ourselves as an instance method read_a_ line); in an array context, however, we’ve got to return all the lines from all the files. The simplest way to do this is to call the read_a_line method repeatedly until it returns undef, while gathering the results into an array, which we’ll then return. This is handled in lines 35 through 39.

So, one step further, we’ve got the instance method read_a_line to deal with, defined in lines 45 through 71. And here’s where the include files are expanded. First, the instance variable Handles is stuffed into the local variable $handles in line 48. We’ll use a naked block once again to createa looping control structure that doesn’tinvolve a goto in lines 49 through 70.

Line 50 ensures that we have some handle to read from. Of course, the first time in, it’ll be the handle initially created in TIEHANDLE. This is actually a stack that can grow and shrink as include files are noted or files reach their end. If we’ve gotten to the end of the last file, it’s time to return undef to designate end-of-file.

Line 51 takes the most interesting filehandle (the one that we’re currently reading from) and reads a line from it in line 52, using an indirect filehandle read-line operation. (I pondered a design for a few minutes that would let this be a tied filehandle, but decided that it would be too mind-boggling a simple explanation.)

If the line cannot be read, lines 54 and 55 remove the now-useless filehandle and restart the logic back at line 50. That gets us back out of the nested include files, and even lets us quit at the end of the initial file.

Lines 57 to 60 determine if we’re staring at an include filename. Line 58 sets up the $_ variable, and line 59 invokes the coderef stashed as the Code instance variable. Whatever is returned ends up in $filename.

If $filename is anything but undef, we’ve got a valid filename that must be included to replace this line. Line 62 attempts to open that file, and if successful, places the newly opened filehandle at the head of the queue (line 63). Otherwise (line 65), we’ll squawk at the user, and go get another line. Line 67 dumps us back at the input fetching starting at line 50.

If we make it down to line 69, we’veseen a good text line from somewhere, and we’re ready to read it.

All files brought in with use or require must end in a true value. To be cute, I put the string 0but true in line 73 as this particular file’s true value — a self-documenting string.

This is mostly a toy example of a tied filehandle, but you can learn more about them by invoking perldoc perltie. There are also examples in some CPAN modules and in the Perl Cookbook (from O’Reilly and Associates). Until next time, have fun!

Randal L. Schwartz is chief Perl guru at Stonehenge Consulting. He can be reached at merlyn@stonehenge.com.

Comments are closed.