Non-SQL oriented distributed databases are all the rage in some circles. They're designed to scale from day 1 and offer reliability in the face of failures.
There’s an interesting shift happening in the world of Web-scale data stores. A whole new breed of scalable data stores is gaining popularity very quickly. The traditional LAMP stack is starting to look like a thing of the past. For a few years now, memcached has often appeared right next to MySQL, and now the whole “data tier” is being shaken up.
While some might see it as a move away from MySQL and PostgreSQL, the traditional open source relational data stores, it’s actually a higher-level change. Much of this is change is the result of a few revelations:
- a relational database isn’t always the model or system for every piece of data
- relational databases are tricky to scale (especially if you start with a single monolithic configuration–they aren’t distributed by design)
- normalization often hurts performance
- in many applications, primary key look-ups are all you need
The new data stores vary quite a bit in their specific features, but in general they draw from a similar set of high-level characteristics. Not all of them meet all of these, of course, but just looking at the list gives you a sense of what they’re trying to accomplish.
- de-normalized, often schema-free, document storage
- key/value based, supporting lookups by key
- horizontal scaling
- built in replication
- HTTP/REST or easy to program APIs
- support for MapReduce style programming
- Eventually Consistent
And I could probably list another half a dozen qualities that many of them share too. But to me, the first two are the biggest departure form the traditional RDBMS. Of course, you can stick with MySQL and go non-relational. That’s basically what FriendFeed did, using MySQL as the back-end to distributed key/value store.
The movement to these distributed schema-free data stores has begun to use the name NoSQL. Rather than spending a lot of time on the philosophy behind NoSQL storage systems, I’d rather highlight a few that have caught my eye for one reason or another and spend a bit of time talking about makes each stand out.
I’m not going to say too much about Redis, since I recently covered it in Redis: Lightweight key/value Store That Goes the Extra Mile. I’ll just recap by saying that it’s a lightweight in-memory key/value store that handles strings, sets, and lists and has an excellent core of features for manipulating those stored data types. Redis also has built-in replication support and the ability to periodically persist the data on disk so that can survive a reboot without major data loss. Redis is still one of the newer kids on the block but version 1.0 was released a few days after I last wrote about it–Murphy’s Law, huh?
Redis is interesting to me because of the simplicity of its API and the fact that it takes the traditional key/value store up a notch by adding those specific data structures and provides fast atomic operations on them.
Coming straight from Japan, Tokyo Cabinet is a fast and mature disk-based key/value store that feels a lot like BerkeleyDB re-written for the Web era. It is usually paired with Tokyo Tyrant, the network server that turns Tokyo Cabinet into a network service that speaks HTTP and the memcached protocol, as well as its own binary protocol.
Like the other modern DBM implementations, Tokyo Cabinet offers B-Tree, Hash, and fixed-size array-like record storage options. It stands out for me because, when used with Tokyo Tyrant, you have a modern network protocol and interface on top a stable low-level database infrastructure that lets you choose the right data structure to use. It is also relatively mature technology that’s in use as part of some very high-volume Web sites.
To quote the Apache CouchDB home page:
CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDBâ€™s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.
CouchDB is written in Erlang, a robust functional programming language ideal for building concurrent distributed systems. Erlang allows for a flexible design that is easily scalable and readily extensible.
In other words, CouchDB is very buzzword compliant!
Seriously, though, CouchDB was one of the first document oriented databases really designed for the Web and to scale with the Web. The fact that it’s now under the unbrella of the Apache Software Foundation speaks to the quality of the code and the relative maturity of the project.
CouchDB also implements a very useful versioning scheme that makes it possible to build collaborative systems without having to re-invent yet another wheel.
The newest data store on my radar, Riak bills itself as a “document oriented web database” that “combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.” It uses the eventual consistency model in an effort to maximize availability in times of network or server outages. In fact, one of the most interesting features of Riak is that you can easily control the parameters that define how available your system is in the face of problems.
Those parameters come from Eric Brewer’s CAP theorem (a good read) which says that the three ingredients we should care about are varying degrees of Consistency, Availability, and Partition Tolerance. Riak, unlike other systems, doesn’t force you into a specific set of CAP values. Instead, it allows you to decide on a per-request basis how stringent you’d like to be.
This control comes thanks to three variables:
W. In a distributed system, N is the number of replicas in the system. So if you write a new key/value pair with N set to 4, then 4 nodes will be expected receive a copy of it (eventually). This value is set on a per-bucket basis.
The R and W values are set on a per-request basis and control the number of nodes that the client must receive a response from to complete a successful read or write operation, R fo read and W for write. This provides very granular control over how clients can react to failed nodes in a cluster.
For good high-level overview, see this presentation from the recent NoSQL Conference in New York City.
There are a surprising number of NoSQL systems available today. The ones I’ve noted so far have caught my eye in one way or another and each uses a different approach to providing an alternative to centralized relational database systems. There are a few others that I’m hoping to experiment with as time allows.
- MongoDB is a VC-backed distributed schema-free database with a very impressive feature set (including additional indexes), commercial support, and mosty hands-off operation.
- Project Voldemort is a fairly mature system that’s in heavy use at LinkedIn and comes with automatic replication and partitioning.
- MemcacheDB combines the proven BerkeleyDB storage system with a network server that speaks the memcached network protocol so you can create memcahed-like nodes that hold more data than would fit in a traditional RAM-based memcached nodes and be assured that the data is not lost after a reboot. In some respects this is similar to the Tokyo Cabinet and Tokyo Tyrant combination.
Have you dipped your toe in the NoSQL waters yet? How did it go?