Migrating to PHP

Welcome to the first of many columns in which we'll explore the various technologies in the LAMP family. For those who aren't familiar with the acro-nym, LAMP stands for Linux, Apache, MySQL, and Perl/Python/PHP -- some of the Open Source world's "best of breed" tools.

Welcome to the first of many columns in which we’ll explore the various technologies in the LAMP family. For those who aren’t familiar with the acro-nym, LAMP stands for Linux, Apache, MySQL, and Perl/Python/PHP — some of the Open Source world’s “best of breed” tools.

In the coming months, we’ll consider topics as diverse as XML to HTML conversion, Apache performance tuning, user authentication, and using Zope (which is a Python-based Web application server) to build collaborative Web sites.

For right now, let’s jump right in and get the ball rolling with a look at how you can use some of these technologies to make your life (and maintaining your Web site) easier.

Are you still producing Web sites by editing plain old HTML files with a text editor? Have you ever had to write a script just so that you could update all the copyright statements on your site when the new year arrives? Wish you could easily handle form submissions without searching for some Perl CGI script? Or, maybe you’re just looking for a better way to build and maintain your Web site.

If you answered “yes” to any of those questions, then it’s time to get with the times and start using PHP, the world’s most popular server-side embedded scripting language. It’s probably already built into your copy of Apache.

Not familiar with PHP? No problem. It’s a fast and easy-to-use programming language. Check out our Scripting the Web with PHP article in the July issue (http://www.linux-mag.com/2001-07/php_01.html) of Linux Magazine for an overview of where PHP came from and what the code looks like. It may just get you excited about all the new things you’d be able to do if you were using the language. Imagine being able to actually program your Web site instead of simply cutting and pasting HTML tags.

Making the Move

Now that you’re convinced that PHP is worth trying out, we need to discuss exactly how to make the switch. There is at least one big problem you could run into — broken links.

If you’re considering converting all, or even part, of a static Web site (composed of regular old HTML files) to use PHP, you run the risk of breaking hundreds or thousands of links that reference pages on your site today, including user bookmarks.

Why might this happen? Well, by default, the PHP interpreter only processes files that end in .php or possibly .php3 (depending on your setup). So, if you were to simply rename all .html files to names ending with .php, you’d likely break all your links. Worse yet, you’d also break links from other sites that point to yours as well as those which users have stored in their bookmarks or favorites.

As far back as 1998, people were warning against Link-rot (http://www.useit.com/alertbox/980614.html). Just put yourself in your users’ shoes. How do you react when you encounter a site that has a bunch of broken links on it? Do you stick around or surf elsewhere? Luckily, you can arrange so that no links are broken while still taking advantage of PHP on your site. In fact, there are several different approaches you can take.

All or Nothing

Having decided to convert to PHP, you next need to decide what exactly to convert. Should all of your .html files be handled by PHP? There’s no technical reason not to do so. PHP will scan any page for its scripting tags and execute all of those that it finds. Since there are no such tags in your pages currently, PHP will send them through Apache unaltered.

To set it up, all you need to do is open Apache’s httpd.conf file with a text editor. Then find the line that looks like the following:

AddType application/x-httpd-php .php

and change it to:

AddType application/x-httpd-php .php .html

Then restart Apache:

# apachectl graceful

Now if you request a .html file from your Web server, the PHP interpreter will scan for PHP tags before Apache serves it to your browser. It’s that easy.

While you’re at it, you might want to consider changing:

DirectoryIndex index.html


DirectoryIndex index.php index.html

so that PHP-enabled pages can be used as the default file when a request arrives for a directory without specifying a filename. Apache will look for both index.php and index.html files in the directory (in that order) before giving up.

Of course, scanning every page for PHP tags each time it is requested doesn’t buy you anything if none of your pages have PHP tags in them. It requires more CPU time than having Apache send them just as they are. If your sever is very busy, has a slow CPU, or contains very large HTML files, you might consider letting PHP only process those pages you want it to.

To do so, you’ll need to either rename the files you’d like PHP to handle (give them a .php extension) and risk breaking links and bookmarks or use another trick to accomplish what you want. Given that we’d rather avoid broken links, let’s look at some of the other options available.

.htaccess Files

One of those options is to enable or disable PHP on a per-directory basis using .htaccess files. A .htaccess file allows you to override or extend Apache’s configuration on a per-directory (or directory tree) basis. You’ll find that most of Apache’s run-time configuration directives can appear in the standard httpd.conf file or in a .htaccess file.

So if you have pages in /www/ pages/small that you’d like to PHP-enable, simply create /www/ pages/small/.htaccess and put this one line in it:

AddType application/x-httpd-php .php .html

Using this method, you will incur a small amount of overhead. This is because Apache has to re-read the .htaccess each time a file in that directory (or one below it) is requested. The alternative would be to use a <Directory> block in your httpd. conf file like the following:

<Directory/www/pages/small> <AddType application/x-httpd-php .php .html>

Doing so means that Apache won’t have to re-read that .htaccess file for each request, but it also means you have to remember that the <Directory> block is in Apache’s config file. When you upgrade Apache or move the files to a new location, you’ll need to update it. By using the .htaccess file, the directive stays with the content it affects and you do not need to restart Apache each time you make a change.


A far more interesting way to attack the problem is using Apache’s URL rewriting capabilities. mod_rewrite is a powerful tool that allows you to arbitrarily rewrite requested URLs based on regular expressions.

In other words, you can write rules which transform the URLs that users request to a form that matches whatever you’d like. Best of all, the user never has to know.

Note that mod_rewrite may or may not already be part of the version of Apache on your system. If it is not, you’ll need to install it. Since mod_ rewrite is a standard Apache module, installation is rather simple.

Implementation isn’t all that different from the way you apply a regular expression in a substitution in Perl. To change “my dog” or “my cat” to “the old dog” or “the old cat” in Perl, you’d write:

$text =~ s/my (dog|cat)/the old $1/g;

It’s just a matter of specifying what to look for and how to transform it (or what to replace it with). As you’ll see, the RewriteRule directive in mod_ rewrite works in a very similar manner.

The rules and conditions for mod_ rewrite can appear in your httpd. conf file or in a .htaccess file. Our needs are simple, so we’ll only need to enable the RewriteEngine and use a single RewriteRule. For more complex problems, it’s possible to construct very complex sets of rewriting rules and conditions.

Enabling the RewriteEngine is as simple as adding:

RewriteEngine on

to either httpd.conf or .htaccess.

The RewriteRule directive is what does all of the actual work, so it’s a bit more complex. The general form is like the following:

RewriteRule <pattern><action> [flags]

where <pattern> is the regular expression which should match the URLs we want to rewrite, and <action> describes the action (or transformation) to apply to the URL. The optional flags tell the rewrite engine how to apply the action. We won’t be needing any flags.

For our purposes, a .htaccess file containing these two lines:

RewriteEngine on
RewriteRule ^(.*)\.html$ $1.php

handles the task of silently redirecting requests for files ending in .html to the same file with a .php extension. So a user asking for page.html will receive page.php rather than a “not found” message.

As you can imagine, there’s a lot you can do with mod_rewrite. It handles matching environment variables, HTTP headers (such as User-Agent, in case you want to rewrite based on the type of browser that made the request), and you can even chain several rewriting steps together.

For more information on mod_ rewrite, see the mod_rewrite documentation (http://httpd.apache.org/docs/ mod/mod_rewrite.html) and the URL Rewriting Guide (http://httpd.apache. org/docs/misc/rewriteguide.html) by Ralf Engelschall — the definitive reference for mod_rewrite.

Next month, we’ll look at password protecting Web pages using PHP and various authentication schemes.

404: This Page Has Moved

We’ve all had the experience of clicking on a link only to be greeted by a “friendly 404″ page, which says something like, “This Page Has Moved,” or “Please Update Your Bookmarks.” This usually means that the Web masters were too lazy to make sure that their links would work after the latest redesign of their site. Rather than putting the effort into planning for this problem, they’re putting the burden of “fixing” it on anyone who may have a link to one of their pages.

While it’s not ideal, at least they bothered to prepare a page that lets us know whose fault it is. You can do the same thing if you’d like. Just use Apache’s ErrorDocument directive in your config file to send users to a single page whenever they hit a link that no longer works:

ErrorDocument 404 /page_has_moved.html

Or, if you’d like to generate it dynamically:

ErrorDocument 404 /page_has_moved.php

This obviously isn’t very helpful to your users, but at least you’re telling them that it’s your fault and not theirs.

There are, of course, better ways.

Jeremy Zawodny works at Yahoo! Finance and is writing Advanced MySQL for O’Reilly. You can reach him at jeremy@zawodny.com.

Comments are closed.