Drizzle: Rethinking the MySQL Database Kernel

Drizzle is a re-thought and re-worked version of the MySQL kernel designed specifically for high-performance, high-concurrency environments. In this exclusive article, MySQL guru Jeremy Zawodny takes an inside look at the goals and state of Drizzle development.

Solid State Disks (SSDs)

The storage revolution is here and it is RAM. Solid State Disks (SSDs), which look like hard disks on the outside but are packed full of non-volatile RAM on the inside, have made quite a splash. They often show up in notebook computers with the promise of lower power, no moving parts, and faster boot times and application startup. And while they’ve delivered on most of those promises, the killer SSD application is database servers.

SSDs currently cannot match the raw capacity of “spinning rust” counterparts (1.5 TB and 2 TB hard drives are available today), but the new drives make up for that by having virtually no seek time. The difference between random and sequential I/O is essentially erased. Moving a traditionally I/O-bound database application (one that needed many disks to ease the pain of seek time, not for aggregate space) from hard disks to a pair of high-end SSDs in a RAID-1 configuration often yields truly stunning results. What used to take hours can happen in minutes and even seconds.

This quantum leap in performance is here to stay and only stands to get better. Despite some recent FUD, SSDs are ready for prime time today. The fact is every major storage vendor is currently re-tooling and revamping its product line to integrate SSDs into their offerings. No doubt they’ll be boasting some amazing best case performance numbers soon.

But like the single-core to multi-core transition, going from hard disks to SSDs really requires re-thinking and redesigning things to get the most benefit.

Fast Networks and Memcached

The classic LAMP architecture has quietly been evolving to incorporate new pieces, largely thanks to gigabit Ethernet (which provides disk-like transfer speeds), 64-bit CPUs, and dropping RAM prices. Many high-volume web sites have added a “caching tier” to their architecture. This caching tier is often based on memcached and sits between the application layer and the database.

Refactoring, Removing, and Plugins

As you can see, many of the fundamental features of systems have evolved and changed since MySQL began it’s hockey stick growth nearly a decade ago. When you combine that with the organic growth of MySQL’s code base over the years and all the features that have been crammed into it, you end up with a code base that, to put it politely, needs work.

So Aker and team have set out to modernize the code, removing old abstractions that are no longer relevant (mysys), removing custom code and replacing it with modern C++ data structures and algorithms, and fixing up various inconsistencies. Along the way the team has been ruthlessly removing needless locks (mutexes) that would greatly reduce concurrency and overall throughput.

The SHOW PROCESSLIST command in MySQL is a perfect example. Every time you run it on a server, it grabs a global lock that is not released until it has finished reading all the information about the state of threads in the server, including the text of currently executing queries. Now that’s something that happens very quickly on most servers, but what about a 64- or 128-core server (available today) running thousands of queries per second? Even a stall of a few milliseconds can really back things up. Worse, you’re more likely to run SHOW PROCESSLIST when trying to debug why a server has slowed down. In doing so, you’d further degrade the situation.

Hence, the Drizzle implementation of SHOW PROCESSLIST no longer performs any locking. It’s effectively performing a dirty read of the thread status. And that’s just fine for 99.999% of uses.

As the team has combs through the code, it occasionally stumbles upon a feature that adds complexity to the system (or reduces concurrency) and asks if it’s worth keeping. Often times the answer is no. The result is that many features have been dropped entirely and others are being moved to plugins.

What’s Gone

Here’s a short list of some major features that have been removed from Drizzle.

With the popularity of memcached and other application level caching, the need for a query cache has decreased over time. It was of limited utility in high-volume applications because every update to a table would invalidate all related entries in the cache. Moreover, locking in the query cache was not nearly granular enough. Hence, it’s been removed from Drizzle.

Triggers have been removed with an eye toward re-adding them once a proper plug-in interface can be defined.

The MyISAM storage engine that served as the bedrock of MySQL for many years does not exist in Drizzle.  It suffered from very poor concurrency and maintaining support for it “contaminated” other parts of the server code.  InnoDB is the default storage engine in Drizzle.

Both the Event Scheduler and Stored Procedures have been elided while developers debate the merits of having them in the kernel and/or developing a reasonable API to support different schedulers and stored procedures in multiple languages.

Views are gone but are expected to reappear with a better implementation at some point.

Prepared Statements are gone as well.  The implementation in MySQL had numerous problems to begin with, plus Drizzle has a new protocol which allows for a better implementation in the future.

Replication has been removed in favor of a more flexible, modular logging system upon which its replacement can be built.

There are numerous other changes in Drizzle as well. Character sets are gone. Everything in Drizzle is UTF-8.  And the variety of column types has been greatly reduced as well. A list of the differences between Drizzle and MySQL is available.

Comments on "Drizzle: Rethinking the MySQL Database Kernel"

daveroberts

With respect, Moore’s Law hasn’t been turned sideways. It’s just the same as it always was. Moore’s law never said anything about CPU speed. Moore’s Law simply states that the number of transistors on a chip doubles approximately every 2 years. In the past, because more transistors were being added, the process technology also had to shrink the feature size, which also meant that the transistors were faster and clock speeds could also be ramped up, but that was a byproduct of Moore’s Law, not the law itself. There is always a tension between clock speed and power consumption, however–more speed means more dissipated power. The problem now is that we have hit a wall with being able to cool the chips themselves. We could run them faster, but they’d burn up with typical air-cooled fans. You can, in fact, still overclock your standard Pentium Core 2 or whatever if you use active cooling.

So, no change to Moore’s Law. It “changed” or “stopped working” only for people who didn’t understand it in the first place. That said, your overall conclusion that things are going more parallel is correct.

Reply
kmarsh

+1 on the Moore’s Law commentary, Dave.

At the moment, no triggers, Event Scheduler, Stored Procedures, Views, Replication or prepared statements. Right now it can only compete with SQLite. I’ll check back in a year.

Reply
jzawodn

Guilty as charged! I’ve been as brainwashed about Moore’s Law as so many other people. :-(

Reply
jzawodn

@kmarsh

And how well does SQLite scale on 64 core boxes with 1024 concurrent users?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>