MySQL Performance from the Start

What are a few of the things you really need to keep in mind when starting that next big MySQL project?

Back in June of this year, I gave a presentation at the 2009 Velocity conference titled MySQL Performance From Day #1. You can see the presentation slides on-line. But this week I want to drill into a few of the topics I hit on during that brief talk because they really seemed to resonate with some folks and generated some good post-talk discussion.

Over the last nine or ten years of working with MySQL, I’ve seen some of the same problems play out over and over. And I’ve spent a fair amount of time trying to help fix those problems (or tell people how to keep things running). After a while, I started to forget how important it is to get just a few things right, at least in your mind, before you start running into problems. It’s not that people don’t want to Do The Right Thing, simply the case that you often don’t know where the barriers are going to be until you run into them. Sometimes a little visibility helps a lot.

Much of this advice is targeted at web applications that aim to serve a large number of users. That is what the Velocity conference is about (in large part). I believe every application can benefit from some of this thinking but you may see a lot less bang for your buck when used for small-scale internal applications.

What’s MySQL There For?

All too frequently, developers don’t stop to think about why MySQL is in their application stack in the first place. Putting aside the question of whether or not MySQL is the “right” choice, it’s very helpful to have a clear idea up front about what does and does not belong in the database. Not doing this leads to an annoying type of scope creep. What starts off as a simple application with a few major features and a handful of tables to support those features morphs over time to add messaging features, social networking, photo sharing, and all-night gaming.

Where does all that data go?

Well, of course, it belongs in the database. That’s what the database is there for–to store data! Right?

Uh. Sort of.

Relational databases like MySQL excel at storing and retrieving structured data via ad-hoc queries. But that doesn’t mean it’s the best system to store all your data. It’s not uncommon for people to use BLOB fields in MySQL to store images, Word documents, Excel spreadsheets, JPEG images, and so on. And if you’ve never tried to manage and scale a growing repository of BLOBs in MySQL, you’re not going to like what you’re building yourself.

There’s a great argument for keeping the meta-data about those objects (owner, modification time, id, path, size, etc.) in MySQL. That’s what a relational database is designed to handle. There are excellent choices for distributed file systems or similar systems that are designed to store binary objects in an efficient and easily addressable way.

Similarly, logging is often tossed into MySQL. This often leads to a couple problems. First off, most logging ends up being information that you rarely need to reference. So you’re writing lots of data that eats up disk space and, more importantly, space in InnoDB’s buffer pool. Worse, log tables often grow without bound. It’s easy to think that you’ll write that log purging script once the site it “big enough” or when all the higher priority work is done. In reality, that day never comes and, in the midst of troubleshooting something unrelated, you discover that the log table has grown to an alarming size.

Consider using an existing centralized logging system. Something as simple as syslog or syslog-ng logging to a network logging host may do the job. It’s easy to configure syslog-like systems such that logs are archived every day or week, compressed, and kept until they reach a given age.

Next: About that Caching Layer

Comments on "MySQL Performance from the Start"

Leave a Reply