The One Thing Perl Got Right

As I type this month's column, we're just pulling away from Ocho Rios, Jamaica, on the latest Geek Cruise (http://www.geekcruises.com) called "Linux Lunacy 2." Earlier today, some of the speakers on this conference/cruise, including Linus Torvalds and Eric Raymond, held a meeting with the Jamaican Linux Users Group. Now, we're out at sea (en-route to Holland America's private island, "Half Moon Cay"), so I'm using the satellite link to upload this column (for a mere 30 cents a minute).

As I type this month’s column, we’re just pulling away from Ocho Rios, Jamaica, on the latest Geek Cruise (http://www.geekcruises.com) called “Linux Lunacy 2.” Earlier today, some of the speakers on this conference/cruise, including Linus Torvalds and Eric Raymond, held a meeting with the Jamaican Linux Users Group. Now, we’re out at sea (en-route to Holland America’s private island, “Half Moon Cay”), so I’m using the satellite link to upload this column (for a mere 30 cents a minute).

Earlier this week, Eric Raymond gave one of his many visionary presentations. This one in particular mentioned Perl in a section titled, “What Perl Got Right.” The message surprised me, because Eric prefers that other popular “P” language for his personal and professional work. The one thing that Eric claims that Perl got right is just one of the many things that I think Perl got right: Perl makes it easy to access low-level operating system functionality.

Let’s take a look at what that means. Perl gives you unlink() and rename() to remove and rename files, respectively. These calls pass nearly directly to the underlying “Section 2″ Unix system calls, without hiding the call behind a confusing abstraction layer. Indeed, the name “unlink” is a direct reflection of that. Many beginners look for a “file delete” operation, without stumbling across “unlink” because of its peculiar name.

But the matchup doesn’t stop there. Perl’s file and directory operations include such entries as chdir(), chmod(), chown(), chroot(), fcntl(), ioctl(), link(), mkdir(), readlink(), rmdir(), stat(), symlink(), umask(), and utime(). All of these Perl calls are mapped nearly directly to the corresponding system call. This means that file-manipulation programs don’t have to call out to a shell to perform heavy lifting.

And if you want process control, Perl gives you alarm(), exec(), fork(), get/setpgrp(), getppid(), get/setpriority(), kill(), pipe(), sleep(), wait(), and waitpid(). With fork() and pipe() you can create any feasible piping configuration, again not limited to a particular process abstraction provided by a more limited scripting language. And you can manage and modify those processes directly as well.

Let’s not forget those socket functions, like accept(), bind(), connect(), listen(), and socket(). Although most people usually end up using the higher level modules that wrap around these calls (like LWP or Net::SMTP), the modules, in turn, call these operations to set up interprocess communication.

And then there’s the user and group info (getpwuid() and friends) and network info (like gethostbyname()). Even opening a file can be tuned using all of the flags directly available to the open() system call, like O_NONBLOCK, O_CREAT, or O_EXCL.

Hopefully, you can see from these lists that Perl provides a rich set of interfaces to low-level operating system features.

So why is this “What Perl Got Right?” Because while Perl provides a decent high-level language for text wrangling and object-oriented programming, you can still get “down in the dirt” to precisely control, create, modify, manage, and maintain systems and data.

For example, if your application requires a “write to temp file, then close and rename atomically” to keep other applications from seeing a partially written file, you can spell it out in Perl as if you were coding in a systems implementation language like C:

open TMP, “>ourfile.$$” or die “…”;
print TMP @our_new_data;
close TMP;
chmod 0444, “ourfile.$$” or die “…”;
rename “ourfile.$$”, “ourfile” or die “…”;

Because Perl’s names are the same (or similar) to the actual system call name, you can leverage existing examples, documentation, and knowledge. In a scripting language without low-level operations, you’re forced to accept and work in the world presented by the language designer, not the world in which you live as a practicality.

Eric Raymond presented several examples of these artificial worlds: an old LISP system that provided many layers of abstraction (some of them buggy) before you got to actual file input/output system calls; the classic Smalltalk image, which provides a world unto itself, but offers very few hooks out into the real world; and, as a modern example, Java, which seems to be somewhat painful about “real world” connections, preferring instead to have its users implement the ideal world for it rather than it adapting to the world.

I agree with Eric’s take on Perl. I’ve personally written about a thousand system administration utilities over the thirteen years that I’ve been playing with Perl, and many of those involved such mundane tasks of opening a file precisely the way I wanted, moving it around, and watching processes and files to make sure they weren’t getting out of hand. It may not be sexy, but it’s where the work actually is — where the work gets done.

So while I encourage everyone to rush out and play with Squeak Smalltalk (http://www.squeak.org) to learn real object-oriented programming, at the end of the day it’s still gonna be Perl (OO or not) that monitors my Web site and pages me if the system goes down.

One interesting side-effect of Perl having so many low-level functions is that it forced those who ported Perl from Unix to other operating systems to think about how to perform those functions portably. Thus, the “Unix API” found in Perl provides a “virtual” operating system interface for Perl programmers, regardless of the platform.

And since I’m familiar with Unix, I can actually code up portable Perl programs that run on MacOS, Windows, and VMS without having to be savvy about their oddities or APIs, even for apparently low-level operations. I remember squealing with delight when a program I had written for Unix that dealt with forking and sockets ran without any code changes on a Windows box at a customer site. I actually expected it not to work, especially not as-is.

But what if something in Section 2 of my Unix manual isn’t supported directly by Perl? Well, on those platforms that support it, the syscall() interface provides a nifty escape hatch. Given the right parameters, the syscall() function can call nearly any single-value-return system call.

For example, suppose the rename() function wasn’t provided directly by Perl. We could simply look it up in /usr/include/sys/syscall.h, apply the proper parameters as indicated by the rename(2) page, and we’re up and running anyway.

The code might look something like this:

sub my_rename {
my $from = shift;
my $to = shift;
$! = 0;
syscall(128, $from, $to);
return ! $!;

my_rename(“fred”, “barney”)
or die “Cannot rename: $!”;

The magic number “128″ came from hunting around in my /usr/include directory until I could find the system call number of rename(). That’s the highly non-portable part of this operation, so your mileage and number may vary.

Once you have the right number, you can issue a syscall(). The value of $! is set to 0 before the call, and checked for a non-zero value after the call. If the operator returned anything of interest, we could also check that at the call itself. If the call fails, the normal die with $! in the text string gives us a reasonable error message.

So, if syscall() works, you can wrap anything in Unix manual Section 2 that isn’t already provided, all without leaving Perl.

But what if syscall() didn’t work? Well, even all the way back in Perl version 4, you had a documented way of “extending” a Perl interpreter using the C-level Perl interfaces. And it all got nicely easier with the release of Perl version 5, using the XS interface. With XS, you can write dynamically loaded object code for your low-level interface (or statically linked on some of the more limited systems), and then use it at will.

But the XS interface was still a stumbling block for many people. Many consider it arcane, requiring too many steps to be useful. So, thankfully, last year, Brian Ingerson (“ingy”) came along and wrote the beginnings of the Inline architecture.

In particular, Inline::C allows you to define arbitrary subroutines in C, and they simply appear as callable Perl subroutines. Behind the scenes, an MD5-hash of the C code is created and used to maintain a cache of to-be-compiled or pre-compiled loadable object files. At this point, renaming a file is as simple as copying the syntax nearly directly from the example of the rename() man page:

use Inline C => <<’END’;
#include <stdio.h>

int my_rename(char *from, char *to) {
return rename(from, to) >= 0;
/* -1 is bad, 0 is good */

my_rename(“fred”, “barney”)
or die “Cannot rename fred to barney: $!”;

Here I’m providing the definition for my_rename as a C function. The arguments are specified exactly as they would be in a C program, and the rename() system call gets called in the middle, massaging the return value slightly.

The Inline facility creates the proper glue to hook the snippet into the Perl-to-C code, and arranges for the C compiler to process that code. The results are cached. The first time this program runs, it takes about a second or so, but every subsequent invocation after that is lightning fast.

So, as you can see, Perl can easily get “down to C level” (just like this cruise ship I’m on). Eric Raymond says this is the one thing that Perl got right. I tend to think it’s a bit more than that.

Until next time, enjoy!

Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and can be reached at merlyn@stonehenge.com.

Comments are closed.