The Apache Portable Runtime

If you’ve ever had to write a portable application in C, you’ve likely run into the same problem faced by countless other programmers: no matter how much you try to stick to a well-defined application programming interface (API), the program just doesn’t work the same on every platform. While POSIX does a passable job of providing a portable API for most Unix and Unix-like platforms, POSIX either doesn’t exist on other operating systems or is so full of bugs as to be unusable. Moreover, POSIX isn’t always the best choice. Non-Unix platforms, such as Microsoft Windows, have their own APIs that are better mantained and perform better on that platform. So, to make something portable, you could write, rewrite, and tweak your code several times — at least so the code compiles on several platforms. Or, you can use the Apache Portable Runtime (http://apr.apache.org/) the same library that makes the ubiquitous Apache HTTP server portable. If it’s good enough for Apache, well, enough said.

Apache 1.3 was ported to a variety of platforms, including many that weren’t POSIX based, such as Windows, OS/2, and BeOS. On those platforms, Apache 1.3 often relied on #ifdef blocks to acheive portability, effectively forking the source into mainline code and platform-specific code, making the code harder to read, debug, and maintain.

When development started on Apache 2.0, the developers knew that they needed a better solution. Initially, two existing solutions were considered. One was the Adaptive Communication Environment (ACE), and the other was the Netscape Portability Runtime (NSPR). However, both were rejected.

ACE was implausible because Apache requires that all code be written strictly in C, and ACE is a combination of C and C++. And while NSPR looked like a good fit, it’s license was incompatible with Apache’s. (The licensing issues were eventually resolved, but by that time, APR was already in development.)

Nonetheless, writing APR from scratch has worked well for the Apache community — and others. (See the sidebar, “Who’s Using the Apache Runtime?” for more details.) It’s a portability layer specifically written with servers in mind.

Let’s see how to write applications with APR. And to appreciate APR’s power, let’s start with code that looks portable, but’s not. As you’ll see, there are devils in the details.

Almost Portable

Some code is inherently portable, because it uses very well documented APIs

that are implemented everywhere. For example, the code…

char *var = getenv(“SHELL”);

… compiles and runs on all platforms, but there are subtleties that may make it behave differently. For instance, is the SHELL variable name case sensitive? On Unix it is, but on Windows, it isn’t. Also, on Windows, applications can be compiled in either UNICODE or ANSI modes. If your application is compiled for UNICODE, then the environment table is UNICODE — but this code always tries to read the environment variables as ANSI strings. These details can be absolutely infuriating for any developer.

In APR, this same concept can be written as:

char *var;
apr_status_t rv;

rv = apr_env_get(&var, “SHELL”, p);

In this case, the code isn’t much more complex than the original, and it resolves the issues of the original code. Because APR always uses native functions under the covers, APR is able to determine if it should be reading as UNICODE or ANSI and react accordingly.

Listing One shows a very simple program that demonstrates getting a single environment variable.

LISTING ONE: Reading an environment variable with the Apache Portable Runtime

1 #include “apr.h”
2 #include “apr_env.h”
3
4 int main(int argc, char* argv[]) {
5 apr_pool_t* p;
6 char *env_var;
7
8 apr_initialize();
9 atexit(apr_terminate());
10 apr_pool_create(&p, NULL);
11 apr_env_get(&env_var, argv[1], p);
12 printf(“%s\n”, env_var);
13 apr_pool_destroy(p);
14 }

The first thing the program does is initialize APR. Every APR-based application should do this as soon as the program starts. (While APR may work without initialization on some platforms, on others it won’t. It’s best to always add the call.)

Next — and another mandatory step — the program calls atexit() to configure APR to call apr_terminate() when the program exits. This ensures that any mutexes and other, limited operating system resources are released.

The rest of the code does the work. Line 10 creates the first pool created (more on pools momentarily), lines 11-12 get the requested environment variable and print it to the console. Line 13 destroys the pool and exits.

If the code is in file abc.c, compile it on Linux using:

$ gcc abc.c `./apr/apr-1-config –includes` \
`./apr/apr-1-config –libs` \
apr/.libs/libapr-1.a -o echoenv

apr-1-config is a configuration script that provides the proper arguments for compiling APR programs. In this case, the includes directories and the libraries that must be linked to satisfy APR’s requirements. You can run the application with ./echoenv SHELL.

Not Portable

Code that looks portable but isn’t is frustrating, to be sure. But a far more complex problem is code that’s obviously non-portable. It’s hard enough for experienced multi-platform developers to remember which function is for which platform — what do you do when porting a complex application to a platform you have no experience with?

For example, loading shared libraries is different on many platforms. What function do you call to load a library on Windows, Linux, HP/UX, and MacOS X? Many people know how to do it properly on one or two of those, but very few people know all four.

Even if you do — on Windows, you use LoadLibraryEx(), Linux uses dlopen(), HP/UX uses shl_load(), and MacOS X uses NSLinkModule() — each of those functions have different arguments and different error codes.

In sharp contract, loading a library in APR is as simple as:

apr_dso_handle_t *h = NULL;
apr_status_t status;

status = apr_dso_load(&h, “testdso.so”, p);

The type apr_dso_handle_t is a handle to the shared library. With a handle, you can load a specific symbol from the library.[ Ed.: What is the argument, p?]

For a complete list of the types of actions that APR makes portable, see the

API documentation on the APR web site.

Managing Memory

In all of the examples above, APR allocated the memory for the APR variables, because for most APR types, only APR can allocate the memory. Because of how most APR types are defined, only APR has the correct size of the APR type, and therefore only APR can allocate the memory correctly.

So how do you allocate memory for your variables? Use pools.

Typical C memory management requires that the code that requests memory also free that memory. If you forget to free any of your allocated memory, your application leaks, which, for a long-running application like a server, can get bad enough to bring the computer to a crawl. Pools address this problem by having all memory allocation happen from shared pools of pre-allocated memory. This allows all of the allocated memory to be freed at one time, without needing to worry about memory leaks.

You can also create a hierarchy of pools, so that each pool has a parent. If you destroy a pool with a parent pool, then the memory from the sub-pool is returned to the parent instead of being freed. If you clear or destroy a parent pool, the child pools are automatically cleared or destroyed, respectively. Finally, you can register functions to run when a pool is cleared. This is useful if you have a resource that must be closed properly, because you can drop the handle to the resource, and when the pool is cleared, the resource will be closed. APR itself uses this feature to ensure that mutexes are released before a program ends.

Pools also enhance performance. One of the worst things you can do in C is allocate and free memory repeatedly. Often, the worst part of your application is malloc() and free(), which you don’t control. By moving to pools, you remove a lot of the overhead of malloc() and free(), because calls to those functions are centralized. In fact, in a well-architected, pool-based application, there are never any calls to free() until the program is about to end. Written to use pools, programs come to a steady state, where it neither allocates nor frees memory while performing its work.

To be fair, working with pools is complex, and they don’t map well to all applications. Any application that is small and doesn’t do the same operation multiple times isn’t a good match for pools.

Pools are APR’s biggest advantage and biggest weakness. For people who like pools and have been using them in their applications, APR is the perfect portability library. However, for people who want to write very object-oriented code, pools can often get in the way. Also, it is very difficult to combine pool-based and non-pool-based code. The APR developers realize that pools aren’t for everybody, and are working on finding ways to abstract memory allocation so that the current pools implementation can use a non-pools based allocator.

See the sidebar “Swimming in Pools” to help determine if your application is well-suited for pools.

Implementing cat using APR

cat is one of the simplest commands found on Unix machines: it reads one or more files or standard input, and prints everything read to standard output. However, paired with a variety of other Unix commands, cat can become a very powerful tool. So, it’s surprising that there isn’t a reasonable facsimile of cat for Windows (except for cygwin, but it removes you from the Windows environment instead of implementing the tools in a native environment).

Let’s implement a simple, portable version of cat using APR. Listing Two shows the most important function ((a complete version of cat is left as an exercise.)

Listing Two: The guts of a portable version of cat

1 void printOutput(apr_file_t *in, apr_file_t *out, int numberNonBlank,
2 int numberAll, int showEnd, int showTab, int showNonprint,
3 int squeezeBlank)
4 {
5 char str[HUGE_STRING_LEN];
6 int linenum = 0;
7 int lastBlank = FALSE;
8
9 while (apr_file_gets(str, HUGE_STRING_LEN, in) != APR_EOF) {
10 apr_size_t bytes;
11 int emptyLine = FALSE;
12
13 emptyLine = !strcmp(str, APR_EOL_STR);
14 if (apr_file_eof(in)) {
15 break;
16 }
17
18 if (squeezeBlank && emptyLine) {
19 if (lastBlank) {
20 continue;
21 }
22 else {
23 lastBlank = TRUE;
24 }
25 }
26 if (!emptyLine) {
27 lastBlank = FALSE;
28 }
29
30 if (numberAll || (!emptyLine && numberNonBlank)) {
31 linenum++;
32 apr_file_printf(out, “%d: “, linenum);
33 }
34
35 if (showTab) {
36 replaceTab(str);
37 }
38 if (showNonprint) {
39 replaceNonprint(str);
40 }
41
42 bytes = strlen(str);
43 if (!strcmp(str + bytes – strlen(APR_EOL_STR), APR_EOL_STR)) {
44 str[bytes - strlen(APR_EOL_STR)] = ‘\0′;
45 }
46 apr_file_printf(out, “%s%s” APR_EOL_STR, str, (showEnd) ? “$” : “”);
47 }
48 }

The function printOutput() loops through a file reading one line at a time, making some changes to the string that was just read, and printing the result. No sub-pools are created, because nothing in the loop actually allocates new memory.

Looking at Listing Two, it should be obvious that using APR doesn’t change the code that you’d normally write to implement cat, with one exception on line 43. Notice the APR_EOL_STR macro. It expands to the correct end-of-line character sequence for the current platform. On Windows, this is CR/LF, while on Unix, it’s LF. This can be a very important difference when porting applications that deal with text files between platforms, and again APR provides the tools for handling this problem.

Listing Three shows the function that determines how to handle printing non-printable characters.

Listing Three: Character checking in APR

1 void replaceNonprint(char *str)
2 {
3 int len = strlen(str);
4 char old[HUGE_STRING_LEN];
5 int i;
6 int offset;
7
8 memcpy(old, str, len);
9 for (i = 0, offset = 0; i < len; i++, offset++) {
10 if (old[i] == ‘\t’) {
11 continue;
12 }
13 if (!apr_isascii(old[i]) && !apr_isprint(old[i])) {
14 str[offset++] = ‘M’;
15 str[offset++] = ‘-’;
16 str[offset++] = toascii(old[i]);
17 }
18 if (apr_iscntrl(old[i])) {
19 str[offset++] = ‘^’;
20 str[offset++] = (old[i] == ‘\177′) ? ‘?’ : old[i] | 0100;
21 }
22 }
23 }

replaceNonprint() is only called if the user wants to see non-printable characters. In this case, lines 13 and 18 are the most interesting, because they show how to determine if the current character is printable or not, and if it is an ACSII character or a control character. These methods are usually implemented as macros on most platforms, but on some of the more esoteric platforms, they don’t exist at all, so APR had to re-implement them.