Understanding the Query Cache

When MySQL 4.0 was released, it included a host of new features. We've already discussed MySQL 4.0 several times in Linux Magazine, but the query cache only received a brief mention in the September 2002 "LAMP Post" column (available online at http://www.linux-mag.com/2002-09/lamp_01.html). And since the query cache is disabled by default, there's a good chance you've not stumbled across it yet.

When MySQL 4.0 was released, it included a host of new features. We’ve already discussed MySQL 4.0 several times in Linux Magazine, but the query cache only received a brief mention in the September 2002 “LAMP Post” column (available online at http://www.linux-mag.com/2002-09/lamp_01.html). And since the query cache is disabled by default, there’s a good chance you’ve not stumbled across it yet.

Remember This

The idea behind the query cache is quite simple. Many applications, especially web-based applications, issue repetitive database queries. These queries needlessly waste resources on the database server and slow down the application. And it just feels wrong.

Most developers who walk into a situation like this quickly realize what’s needed — a server-side cache. Typically you’d build a caching layer on the web server using flat files, Berkeley DB, shared memory, or the like.

Building your own caching system isn’t terribly difficult, but it does take time and involves writing extra code. You’ll benefit from code that’s written specifically for your application, but in doing so you’ll also have to make many decisions about cache policy and other details. How do items expire from the cache? Is it a least-recently-used (LRU) cache? Do you assign a time-to-live (TTL) to each cached item? Can the application override or bypass the cache?

The bottom line is that you’re going to spend some effort to get this all figured out, coded, tested, and deployed. All that just to make your application remember things instead of constantly re-querying for them.

Or you could just use MySQL’s built in query cache.

Benefits

Let’s look at some reasons why the query cache is a good idea.

To begin, the query cache in MySQL is already built. The amount of code you’ll need in your application (if any) is minimal. And since the cache operates on the database server, you’ll benefit from it no matter what language you’re using: PHP, Java, Perl… MySQL doesn’t care.

Because MySQL controls the cache, it doesn’t have to make any guesses about how long to cache data. In theory, it could expire cached items only when someone else uses a query to change one of those items.

In practice, it’s not quite that granular, but it could be. In many applications, you’d be hard pressed to do this on the web server side.

MySQL’s query cache, like the rest of MySQL, is easy to configure. As you’ll soon see, it’s just a matter of setting a few configuration parameters and letting MySQL do the rest.

Finally, as you’d expect, MySQL’s query cache is fast. Very fast.

How it Works

The query cache in MySQL is really two things put together. First, it’s an area of memory used for caching the results of queries. And it’s also the logic inside MySQL that finds and stores cached results and invalidates (or prunes) old data.

When the query cache is enabled, MySQL first checks the cache for the results of a SELECT query before parsing the query, building an execution plan, or doing much else for that matter. It simply hashes the raw SQL and uses the resulting value to locate the cached results (if any).

In other words, the query cache is like a big hash table. The queries themselves are hashed to provide the necessary lookup key. The “values” are the results of each cached query.

If MySQL locates a cache result, it simply returns the results to the client and does no additional work. The cache results are even stored in the same format as the one sent over the network, so a cache hit requires very little CPU time.

If MySQL doesn’t locate a result, it parses, plans, and executes the query just as it normally would. Then, when it’s time to return the results to the client, it stashes an extra copy in the query cache for later use.

So far, so good. What about purging old data?

Non-SELECT queries also affect the query cache. An UPDATE, INSERT, REPLACE, or DELETE query can change data and possibly invalidate some of the results stored in the cache. So, MySQL examines the list of tables that each query modifies and makes sure to purge any cached results that came from the affected tables.

In other words, if an UPDATE query runs against the customer table, MySQL purges all of the cached results that were previously derived from the customer table. This is the “not quite that granular” behavior alluded to earlier.

On a very large table, it’s probably wasteful to throw out all cached results. It’d make more sense to purge only those cached results that would actually be different after the UPDATE. But doing that would add quite a bit of complexity to the code and would certainly slow it down. Caches aren’t supposed to perfect. They’re supposed to give you the biggest bang for your buck without adding too much complexity or overhead. MySQL’s query cache strikes a reasonable balance between performance and overhead.

Configuration

In 99% of use cases, there are only two configuration parameters to worry about: query_cache_type and query_cache _size. Both are typically set in the [mysqld] section of your /etc/my.cnf file.

There are three possible values for query_cache_type: 0 disables the query cache; 1 enables the query cache; and 2 sets the query cache in “on demand” mode.

Setting the type to 1 means that every SELECT query is considered for inclusion in the cache. This works well in many cases, but there are applications that find only a small subset of their queries are really cachable. In those cases, it’s best to bypass the cache. You can do that with the SQL_NO_CACHE hint in a query:


SELECT SQL_NO_CACHE * FROM my_table WHERE …

Or, you can set the type to 2. Doing so means that MySQL only considers queries for the cache if you explicitly mark them. You do that with the SQL_CACHE hint:


SELECT SQL_CACHE * FROM my_table WHERE …

The bottom line is that you control the default behavior: off, on, or “only when requested.”

The other configuration parameter, query_cache_size, controls how much memory MySQL sets aside for the cache. The value you specify is the upper limit. Start with a relatively small value (8 or 16 MB) and increase from there if needed.

Statistics and Optimization

Once the cache is running, it’d be nice to know how well it’s doing. If you happen to use mytop (http://jeremy.zawodny.com/mysql/mytop), you’ll find that it provides a query cache hit rate in the summary display. Otherwise, you can examine MySQL’s query cache related statistics directly using the SHOW STATUS command:


mysql> show status like ‘Qca%’;
+————————-+———-+
| Variable_name | Value |
+————————-+———-+
| Qcache_queries_in_cache | 8 |
| Qcache_inserts | 545875 |
| Qcache_hits | 83951 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 2343256 |
| Qcache_free_memory | 33508248 |
| Qcache_free_blocks | 1 |
| Qcache_total_blocks | 18 |
+————————-+———-+
8 rows in set (0.00 sec)

As you can see, MySQL (as of version 4.0.17) tracks eight statistics that give you an idea what’s going on. If you’d like to calculate a cache hit rate (or percentage) on your own, you’ll also need to know how many SELECT queries the server has handled:


mysql> show status like ‘Com_sel%’;
+—————+———+
| Variable_name | Value |
+—————+———+
| Com_select | 2889628 |
+—————+———+
1 row in set (0.01 sec)

Here, MySQL was able to serve 83,951 of 2,889,628 queries from its cache. That’s 1 of every 34. And we also know that only 545,875 of the 2.8 million queries ever had results inserted into the query cache. Seeing that large of a difference, chances are that this server is running with the query_ cache_type set to 2.


mysql> show variables like ‘query_cache%’;
+——————-+———-+
| Variable_name | Value |
+——————-+———-+
| query_cache_limit | 1048576 |
| query_cache_size | 33554432 |
| query_cache_type | DEMAND |
+——————-+———-+
3 rows in set (0.00 sec)

And indeed it is.

The query_cache_limit you see is a size threshold MySQL consults to decide if it should cache the result of a query. The setting you see (1 MB) is the default and is suitable for many applications. However, if you have a lot of queries that return large result sets, feel free to increase it.

As for the other statistics that MySQL tracks, most of them are named rather intuitively and require little explanation.

But you do need to know that the query cache is divided up into “blocks” that MySQL may choose to split into smaller blocks when space gets tight. The free blocks and total blocks give you an idea of how often it has to do that.



Jeremy Zawodny plays with MySQL by day and spends his spare time flying gliders in California. He is the author of High Performance MySQL, published by O’Reilly and Associates. Reach him at Jeremy@Zawodny.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62