Mattkit: A Great MySQL Toolbox

Here's a look at some of the most useful MySQL tools you've probably never seen (or heard of).

Back in the year 2000, I wrote a tool for MySQL that I called mytop. It was a clone of the classic Unix top utility, born of out a frustration with not having a good tool to visualize what MySQL was doing at any given moment. Rather than running SHOW PROCESSLIST and SHOW STATUS over and over, I decided to just let the computer do that for me. It evolved quickly while I was actively doing MySQL work at Yahoo! and I even received a few good patches from other folks to integrate.

Eventually I got drawn into other things and mytop languished around the time that MySQL 4.1 stabilized. (A surprising number of people still seem to use mytop despite it’s many shortcomings.) Back then it seemed that MySQL shipped with a pretty minimalistic set of command-line utilities, and that is still true today. Thankfully mytop was just the beginning.

In 2006, Baron Schwartz of Percona announced innotop. As its name suggests, innotop is a sort of sequel to mytop–the kind of sequel that far exceeds what the first was able to do, and the kind that digs much deeper into the statistics provided by the InnoDB storage engine.

In his own words:

Perhaps its most powerful and ambitious feature is the way it presents InnoDB internals. MySQL is sorely lacking in instrumentation and analysis compared to other major relational database systems (for example, Microsoft SQL Server), and it is just no fun searching through the output of the InnoDB monitors to glean bits of information from it. In my opinion, this feature alone is a major step forward to looking at what MySQL is doing internally. The information has always been there, but until now it’s been hard for DBAs to use.

Nowadays innotop is part of Maatkit, a collection of similarly useful tools that fill in gaps left by MySQL/InnoDB and are indispensable for working with MySQL in a dynamic and high-volume environment. Maatkit is sponsored by Percona, who releases it under the GPL and offers paid support and sponsorship of new features. Maatkit is hosted on Google Code and free support is available in the Maatkit Discuss Google Group.

While you can read all the documentation for Maatkit’s tools on the Maatkit documentation page, I’d like to highlight a few of my favorite utilities here.

mk-heartbeat

Monitoring the replication “delay” in MySQL isn’t easy using MySQL’s built-in metrics. When you run SHOW SLAVE STATUS on a slave server, you’ll see a variable called Seconds_Behind_Master that sounds like exactly what you want. But it’s not. That value is comparing the slave’s current time with the time on the master when it executed the currently replicating SQL statement. If the slave is low-volume (or not properly slaving) you can end up with misleading information or even a false sense of security.

The best way to monitor MySQL replication is to use a two-part heartbeat system. Part one is a process that writes the current system time to a table. On the slave(s), a second process extracts that most recent heartbeat and compares it to the local system time. Comparing the two times gives you a good idea how far behind the slave is at any given instant, plus or minus the frequency of heartbeat “pulses” written on the master. That is, if the heartbeat on the master is written every 5 seconds, you’ll have slave status down to a granularity of 5 seconds but not better. But really, there’s little reason not to have 1 second heartbeat pulse.

The only catch is that you need to make sure the systems have clocks that are being kept in sync via ntp or a similar service.

mk-parallel-dump and mk-parallel-restore

Using mysqldump to dump a large data set on modern hardware is a bit of a joke–especially if you’re using it as a backup plan. It can take an eternity to extract and/or restore data since you can’t eaisly use more than one CPU code to do the work. Maatkit’s mk-parallel-dump provides a solution to dumping data more quickly. While it has a number of command-line options, the --threads flag (which defaults to 2) gives you the control to balance CPU and I/O for your dumps. Since dumping tends to be I/O bound anyway, going much higher than half the number of cores in your server probably won’t yield much more performance unless you have a lot of I/O capacity (SSDs, perhaps?).

Similarly, mk-parallel-restore can be used to reload mutiple tables at the same time from a dump. Again, the --threads option is your friend–especially since restores tend to be dominated by CPU time. Thankfully, it defaults (on Linux systems) to the number of cores on the system.

Both tools have a number of other interesting features that are worth reading about.

mk-slave-delay

A long time ago, I remember talking with some of the MySQL developers about an option that would purposely delay applying updates to a slave by a specified amount of time. That never really went anywhere and our use case was fairly weak to start with (enforcing a contractual delay in data propagation to live servers). But the mk-slave-delay utility provides this feature with some real rationale too. Having a time delayed slave means that you can actually recover from some previously catastrophic mistakes.

People often joke that RAID is not a substitute for proper file system backups, because rm -rf / runs just about as fast on a RAID volume as it does on a single disk (sometimes faster!). The same is true a DROP TABLE in a MySQL replication environment. A poorly written WHERE clause in a DELETE or UPDATE statement could be just as bad. But if you have a slave that’s known to be roughly 30 minutes behind, that provides you with 30 minutes to catch your mistake, stop replication on that slave, and get the data back.

mk-slave-prefetch

On the opposite end of the spectrum is a tool for speeding up replication. Often times slaves aren’t able to apply updates as fast as they otherwise would because the records are not cached, either in InnoDB’s buffer pool or at the file system buffer cache or even on RAID controller. mk-slave-prefetch uses a clever trick to mitigate this problem. It reads ahead in the slave’s relay log, trying to stay ahead of the SQL thread, and runs queries that attempt to pre-cache the rows that will be affected by the queries that the SQL thread has not yet seen.

It operates by parsing the log entries and transforming the write queries into non-destructive SELECT queries that use the same criteria to select records. In effect this gets MySQL to read the records and cache them without doing any CPU intensive work. The effectiveness of this technique varies quite a bit from server to server, but it’s definitely worth trying if your slaves fall behind from time to time as a result of I/O bottlenecks.

That’s Not All

I’ve only touched on a few of the two dozen tools that comprise Maatkit today. While they’re all definitely worth checking out, I also specifically suggest looking at mk-query-profiler and mk-visual-explain. You’ll soon start to wonder how you got along with them.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62