Popularity often comes at a high price, especially on-line where news, fads, links, and word-of-mouth literally spread at the speed of light. The creators of the popular "Hot or Not" site (http://www.hotornot.com) learned this lesson the hard way. Overnight, their traffic went through the roof. In response, they had to spend a fair amount of time and effort figuring out how to manage (and pay for) the traffic their site generated.
Popularity often comes at a high price, especially on-line where news, fads, links, and word-of-mouth literally spread at the speed of light. The creators of the popular “Hot or Not” site (http://www.hotornot.com) learned this lesson the hard way. Overnight, their traffic went through the roof. In response, they had to spend a fair amount of time and effort figuring out how to manage (and pay for) the traffic their site generated.
Like “Hot or Not,” you never know when your site might get posted on Slashdot or hyped in one of those “You’ve got to see this site!” email messages. If you have a bandwidth cap or pay increasingly high rates for connectivity, a traffic surge can burn lots of real cash in a real hurry.
The traditional solution to handling traffic surges is to throttle traffic on the network — often at the router. While effective, reconfiguring a router typically requires you to get in touch with your ISP or hosting provider and convince them to implement the necessary throttling. All the while, your bill grows larger and larger. Even worse, if you hit a bandwidth cap, your entire site can effectively be taken off-line.
Rather than depending on someone else to solve your traffic problems, you can take matters into your own hands and install the Apache module mod_throttle. mod_throttle allows you to build custom bandwidth and connection rate policies for individual files and directories and entire servers.
To understand why mod_throttle is so powerful, consider the traditional solution of throttling at the router (or firewall). This simple-minded approach configures a router to limit the rate at which packets go to and from your server. It’s a unilateral constraint: all traffic is effected. Throttle at the router and all of your services are punished for the bad behavior of just one. In a virtual hosting environment, that means all customers are punished for the popularity of a single, unrelated site.
If the content you’d like to throttle is in one particular directory, there’s no need to punish the users of the rest of your site. With mod_throttle, you can make that decision. Even the most sophisticated router-based throttling isn’t granular enough to handle this. Using mod_throttle, you can control the average maximum transfer rate for a given client.
As you can see, throttling isn’t as simple as it first might seem. Luckily, mod_throttle provides policies to combat most of the throttling problems you’re likely to encounter. Let’s take a quick look at them.
- The None policy doesn’t provide any throttling. It’s used to test throttling rules before deploying them.
- Using the Concurrent policy you can control the number of concurrent requests that can be made. This policy can keep clients from monopolizing your server with too many simultaneous requests.
- The Document policy limits the number of “document” requests per time period. This policy applies to “pages” rather than “hits” since a single page may cause many requests to retrieve referenced images, CSS, and other media files.
- The Idle policy forces a minimum idle time (or delay) between requests. This can be used to counteract web spiders that try to suck down pages from your site as fast as possible.
- The Original policy is a volume-based policy that is inherited from an earlier version of mod_throttle. It’s a bit complicated, so check the mod_throttle docs if you’re curious.
- The Random policy randomly accepts a percentage of incoming requests. If the percentage is 100, all requests are accepted. If it is 0, none are. By using a value between 0 and 100, you can effectively refuse a percentage of your requests.
- The Speed policy imposes a limit on how fast data is sent per period. The Volume policy imposes a limit on how much data is sent per period. And the Request policy limits the number of requests per time period. The Request policy is also effective in slowing web spiders.
Each of the policies are controlled by parameters, including a limit and a period. If you want to throttle something to 10 KB/s, then the limit is 10 KB and the period is 1 second.
The Original, Speed, and Volume policies all work on the amount of data sent. In other words, they count bytes. Because you can configure the period for those policies, you can be tolerant of short bursts of traffic while still capping potential abusers. Concurrent, Document, Idle, and Request deal with requests, regardless of size.
Adding the Throttle
Installing mod_throttle is simple. The easiest method is to build it as a Dynamic Shared Object (DSO) and then add it to your Apache configuration. First, download and unpack the code:
Next, build and install it:
$ make; sudo make install
Add a LoadModule line to your httpd.conf:
LoadModule throttle_module /usr/lib/apache/1.3/mod_throttle.so
And then add an initial throttle configuration:
ThrottleClientIP 100 None
Next, test and restart Apache:
$ sudo apachectl configtest
$ sudo apachectl restart
|Figure One: The mod_throttle status page |
Finally, visit the throttle-status page to make sure the module is working. Use a URL similar to http://www.example .com/throttle-status. The page should look roughly like Figure One.
With mod_throttle installed and working, let’s look at a couple of real configurations to get you started.
First, let’s consider serving a directory of large and popular content, such as a collection of parodies of Apple’s “switch” commercials (http://www.apple.com/switch).
ThrottlePolicy Speed 100K 1s
The ThrottlePolicy directive, as you might guess, specifies the policy you’d like to implement for this location. The specifications always follow the form:
ThrottlePolicy Policy Limit Period
If you put a collection of big movies on the server and then try to fetch a movie from another machine, you might find that the file is transmitted at a very high speed — much higher than 100 KB/s. What’s happening?
Unlike the granular rate-limiting that is possible at the router or kernel level, mod_throttle works with HTTP requests (rather than network packets). All of the bytes are counted, but they can’t be counted until after a request is serviced. mod_throttle keeps track of how many bytes were sent and how long it took.
Next, let’s consider the problem of a Web robot that hits your site too quickly. The following entry limits all clients (identified by unique IP address) to five requests per second. That rule is applied server-wide unless the entry is enclosed in a VirtualHost, Location, or Directory block.
ThrottlePolicy Request 5 1
Look back at Figure One. As you add new policies to your server, the mod_throttle statistics page reflects each one, along with various statistics, including number of hits refused and bytes sent. By clicking on a policy name, you can discover which clients are currently in violation of the policy. You can also use the Web interface to reset those clients.
That’s all there is to getting started. mod_throttle can keep your bandwidth costs under control, and can also help prevent a Denial of Service attack from crippling your web site.
Jeremy Zawodny uses Open Source tools at Yahoo! by day and is writing a MySQL book for O’Reilly & Associates by night. Reach him at: Jeremy@Zawodny.com.