dcsimg

RethinkDB: Rethinking the Database using Modern Assumptions

As technology evolves, it's often worth re-thinking how we do things. A small group of engineers is doing just that for MySQL.

Over the past few months I’ve written about a collection of trends that are all pointing toward some new technology that didn’t quite exist yet. The perfect storm of cheap, commodity 64bit processors, abundant (64-128GB per server) and relatively inexpensive RAM, and faster storage. The MySQL community’s reaction to all of this has been a series of patches and hacks that aim to squeeze more performance out of InnoDB and, to a lesser degree, MySQL itself.

All of this work is still ongoing and some of it is fairly invasive–getting at some of the core assumptions present in both MySQL and the InnoDB storage engine. When such radical change are necessary, you start to wonder if maybe it isn’t time for a new design of some sort. That is exactly where the founders of RethinkDB are coming from too.

I recently had the chance to sit down with them over lunch in Mountain View, California and get their perspective on how these changes are influencing their design, understand their goals a bit more, and get a sense of how far along they are in development.

But before I really dive into things, there are a few points to keep in mind:

  • the team has only been working on this for a few months
  • the company is very small right now (intentionally)
  • their plans will change as the potential market reacts to their progress, so don’t take any of this as gospel

Normally I wouldn’t devote an entire article to writing about code that’s not yet available, but it fits in very well with a lot of the themes I’ve tried to hammer on and it’s addressing a need for which I believe there is some real pent-up demand.

It’s also worth pointing out that even in its early state of development, it’s working well enough to power the RethinkDB web site itself. That dogfood mentality no doubt will serve them well as the code moves closer to being feature complete.

The Big Idea

If you were interested in getting great performance for a typical high-volume web application using MySQL (think OLTP), with a 80% read and 20% write workload and could start with a clean slate, what would you do? If you were the RethinkDB team, you’d make some assumptions about that use case and how the engineers building the web application would like to solve it.

First you’d expect the majority of data, or at least the working set, to fit in system RAM. What’s left over would ideally live on a storage-class memory in the form of anSSD, Fusion-io “drive”, or similar technology. This choice has a pretty dramatic impact on the complexity of the code and the data structures you’d choose for various parts of the implementation.

One example is how data is flushed to disk. InnoDB makes attempt to find adjacent dirty pages in the buffer pool and write them to disk in order, thus preventing unnecessary disk seeks. But with most storage-class memory, that’s a non-issue. In fact, you pretty quickly are able to convince yourself that the log-oriented approach used byPBXT (see previous coverage) isn’t such a bad idea.

Once you’ve come to that point, you realize (as Paul did) that some of the data consistency requirements are simplified as well. That means less code and complexity (and less to go wrong).

What about data sets that are far larger than the available RAM? Sharding. Expect that users who really care about performance are going to figure out how to split their data among servers so they can continue to scale without buying big iron.

Lies, Damned Lies, and Statistics

One of the more interesting things that came up in our conversation really highlighted how some of the problems with MySQL and “seekless” storage systems live well above the storage engine layer. The RethinkDB team found that they had to lie to the query optimizer in order to get it to do what they wanted.

You see, normally MySQL parses a query, figures out which indexes may be useful, and queries the underlying storage engine to get information about those indexes in order to produce an execution plan. But the optimizer doesn’t realize that you may have a RAID-0 array of very fast SSDs that make previously poor query plans. And, worse yet, there’s no way for the DBA to provide it with hints in that direction.

So they just lied to it, providing somewhat bogus statistics that trick MySQL into choosing an execution plan that would have otherwise been considered far from optimal.

Speaking of statistics, the RethinkDB team recently posted some benchmark charts that show how their code performs against both MyISAM and InnoDB. While it’s just the first of what is likely to be a never-ending set of benchmarks, the performance they’ve shown certainly justifies the approach they’re taking. It will be interesting to see the performance in future test with high write loads and larger data sets that require much more disk use.

Business and Licensing

You can download RethinkDB binaries using the links on RethinkDB Wiki, but the source code is not publicly available today. Being a young company, they’re still working through the issues of how best to build a business around a MySQL storage engine. Should the code be available to everyone under the GPL (or compatible license)? Should it be given to customers only? Should they charge customers licensing fees, or just for support and future development requests?

None of these are small issues and numerous companies have been down this very road before and made differing choices. It is my hope that the can strike a balance between being able to make a thriving business whlie also building an ecosystem of developers and users around their code (that the intrinsic credibility that brings).

Time will tell, but at the moment it looks like they’re re-thinking of a MySQL storage engine could turn out to be a real player in the future.

Comments on "RethinkDB: Rethinking the Database using Modern Assumptions"

muffycompo

Hmm i think those engineers are up to something creative. Its a project work looking into. Nice Article!

Reply
shred805

looks interesting

Reply

Thanks so much for the article.Really thank you! Much obliged.

Reply

Hi there, You’ve done an excellent job. I’ll certainly digg it and personally recommend to my friends. I’m confident they will be benefited from this site.

Reply

You completed a few fine points there. I did a search on the theme and found nearly all folks will have the same opinion with your blog.

Reply

Holy Toldeo, so glad I clicked on this site first!

Reply

That could be the end of this report. Right here you?ll find some sites that we feel you?ll appreciate, just click the links.

Reply

Hey! I know this is kinda off topic but I was wondering which blog platform are you using for this site? I’m getting sick and tired of WordPress because I’ve had issues with hackers and I’m looking at options for another platform. I would be awesome if you could point me in the direction of a good platform.

Reply

Really informative article.Thanks Again. Really Great.

Reply

IjSbJh You made some good points there. I checked on the net for more information about the issue and found most individuals will go along with your views on this site.

Reply

Informamos que no atendemos averías de aparatos en periodo de garantía, no somos servicio técnico oficial Siemens en Lahiguera, sino que ofrecemos nuestros servicios para su reparación. Nuestros técnicos son expertos en la reparación de electrodomésticos de todas las marcas del mercado, teniendo especialistas para las principales marcas. Siguiendo nuestra política de calidad nuestros técnicos están sometidos a continua formación y disponen de las herramientas más avanzadas.

Reply

I?¦m not sure where you are getting your information, but great topic. I needs to spend a while learning more or understanding more. Thank you for fantastic information I was looking for this information for my mission.

Reply

Thank you, I have recently been looking for information about this topic for ages and yours is the best I have discovered till now. But, what about the bottom line? Are you sure about the source?

Reply

Always a significant fan of linking to bloggers that I appreciate but don?t get a lot of link enjoy from.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>