PHP is an excellent language for building Web applications. PHP's syntax is likely to be familiar to anyone who's programmed in C/C++ or Perl, and PHP integrates with literally hundreds of third party libraries, providing access to everything from IMAP and MySQL to GD for image manipulation and SNMP for monitoring network devices.
PHP is an excellent language for building Web applications. PHP’s syntax is likely to be familiar to anyone who’s programmed in C/C++ or Perl, and PHP integrates with literally hundreds of third party libraries, providing access to everything from IMAP and MySQL to GD for image manipulation and SNMP for monitoring network devices.
One of the more mysterious topics in building Web applications is performance tuning. Tracking down bottlenecks in any moderately complex Web application is difficult. Even after you’ve identified and fixed all of the obvious problems (such as overworked servers and suboptimal database queries, to name just two) your application still may not be as fast as you’d like it to be. Well, slow code might not be your fault. PHP itself may be the culprit!
Luckily, there are a variety of add-on modules for PHP that aim to boost the performance of your Web applications. This month and next, we’ll look at non-invasive (meaning that no code changes are required) technologies that can improve the performance of PHP-based Web applications. Non-invasive solutions are especially important when the code you’re running isn’t yours. Rather than diving into unfamiliar code — say, of an off-the-shelf PHP application such as PostNuke or Drupal — or sending vague bug reports to to the project’s maintainers, you can drop in an add-on module and get the boost you need.
Behind the Curtain
Let’s start with a very high-level look at what happens when Apache handles requests for a PHP-based application.
- Apache receives a request for /foo.php, locates foo.php on disk, and passes control to the PHP engine.
- PHP reads foo.php and parses the contents of the file, looking for PHP code to be compiled. PHP automatically processes the directives include() and require().
- The file contents are compiled (or transformed) to an in-memory representation of the code.
- PHP executes the compiled code.
- The resources used by PHP (such as database connections, file descriptors, and memory) are freed.
- Apache logs the request.
When the next request for /foo.php comes in, the process is repeated. (Yes, this is an over-simplified view of what’s happening.)
During each step, there’s probably some work that can be reduced or eliminated. Most of the effort that goes into building and tuning PHP applications is directed at Step 4. If you use good algorithms, cache frequently used data, and adhere to good software engineering practices, you can reduce processing time to a minimum.
However, when your Web site receives a lot of traffic and the machine(s) hosting it begin to get busy, Steps 2 and 3 add to the load. Let’s look at how that work can be reduced and sometimes eliminated.
Eliminate and Remove Redundant Redundancy
Steps 2 and 3 are often rather involved. Pages in complex applications are often composed of many files. There may be an include file for database connectivity, another include file for input validation, and so on.
While a given page may only use a fraction of the code found in each include file, PHP must include and compile all of it. Worse, the code compiled in this reference to the page is likely to be identical to the code compiled during the last reference to the page.
Why? Because PHP code really doesn’t change that often. While PHP is used to build dynamic Web sites, the code itself is rarely dynamic. The real live content is either sitting in an external database or is generated in response to user input, as in an on-line greeting card application.
But PHP doesn’t know that. It happily reads and recompiles each file — every time. It doesn’t care how many weeks have gone by since the code last changed. That means PHP’s doing unnecessary work that takes time. If the files are accessed frequently, Linux will cache the data so that you don’t need to wait for the disk, but PHP will still parse the code, resolve the includes, parse those files, and so on.
The obvious solution is to cache the compiled code and replace it only when the actually source code on disk changes. By checking the modification time of each file and comparing it with the timestamp of the compiled version, PHP could spare a lot of effort.
It’s the old space for speed tradeoff. You’ll need sufficient memory on hand to hold the cached code, but the performance benefits are almost always worth it.
Work Smarter, Not Harder
Taking things a step further, there’s likely a lot of room for improvement in Step 4 as well. In fact, many man-years of work have been invested in the more traditional optimizing compilers like gcc. When optimizing, sometimes in several passes, compilers examine code and make changes that speed up the code, but don’t affect its outcome.
Although simple, here are a few checks that an optimizing compiler might perform on PHP code:
- Remove unused code. In complex code that’s evolved over time, it’s likely that there are bits of code laying around unused. The code could be as simple as a variable whose value is never used or maybe entire subroutines that are simply not called anymore. Removing unused code would speed compile time.
- Change constant variables to constants. If a variable’s value never changes, it may be more efficient to substitute a constant in its place.
- Evaluate constant expressions. If you often use expressions like $seconds_per_day = 60 * 60 * 24, PHP could compute the result (86,400) before executing the code and substitute that constant value for the equation.
- Remove unnecessary subroutine calls. Perhaps you often call a debugging routine during development, but when it’s time to deploy the code, you simply update the subroutine so that it returns right away:
// old code below
A good optimizing compiler should notice that and never make calls to debug_stuff() since the result is known in advance.
The list goes on. There are many, many general patterns that optimizers can look for in the hopes of speeding up code. Moreover, there are many special cases particular to PHP and the way PHP is most often used that could likely be optimized by a smart, PHP-aware optimizer.
Just In Time
In fact, PHP optimization could follow Java’s lead and look at JIT (Just In Time) compilation techniques. Some Java Virtual Machines actually transform Java bytecode into native machine code as it’s encountered.
This too is trading memory for speed, but once the code’s been compiled into native machine instructions, it executes very quickly.
A final twist on the JIT technique is showcased in Sun’s HotSpot Java technology (http://java.sun.com/products/whotspot). Rather than compiling all of the code it encounters, HotSpot allows the bytecode to execute for a while while it watches which sections of the code are used most often. With that knowlege in hand, HotSpot can make much more intelligent decsions.
So far, no JIT-like solutions exist for PHP, but it’s logical to assume that someone will build one sooner or later. PHP’s not a passing fad. It’ll likely be here for years to come, and the applications built with it seem to get larger and more complex all the time.
Looking ahead to next month, here are the three PHP add-ons we’ll discuss.
- APC, the Alternative PHP Cache (http://apc.communityconnect.com/about.html) is an an open source add-on that focuses on caching compiled PHP code in shared memory. It doesn’t auto-detect when source files are changed, but it does provide a mechanism for flushing the cache.
- The Zend Optimizer (http://www.zend.com/store/products/zend-optimizer.php), built around the Zend Engine (the core of PHP), goes a step further and implements several compile-time optimizations as well as caching. The Zend Optimizer is a commerical product with no source code available. (The Zend folks offer a suite of PHP-related products.)
- The ionCube PHP Accelerator (http://www.php-accelerator.co.uk) is similar to APC, but it’s not open source. It is free, but you can’t access the code.
Each optimizer takes a different approach to solving PHP performance problems, and you can see that they all have different licenses.
Check back here next month to see how each one works.
Jeremy Zawodny uses Open Source tools at Yahoo! by day and is writing a MySQL book for O’Reilly & Associates by night. Reach him at Jeremy@Zawodny.com.