dcsimg

NoSQL: Distributed and Scalable Non-Relational Database Systems

Non-SQL oriented distributed databases are all the rage in some circles. They're designed to scale from day 1 and offer reliability in the face of failures.

There’s an interesting shift happening in the world of Web-scale data stores. A whole new breed of scalable data stores is gaining popularity very quickly. The traditional LAMP stack is starting to look like a thing of the past. For a few years now, memcached has often appeared right next to MySQL, and now the whole “data tier” is being shaken up.

While some might see it as a move away from MySQL and PostgreSQL, the traditional open source relational data stores, it’s actually a higher-level change. Much of this is change is the result of a few revelations:

  1. a relational database isn’t always the model or system for every piece of data
  2. relational databases are tricky to scale (especially if you start with a single monolithic configuration–they aren’t distributed by design)
  3. normalization often hurts performance
  4. in many applications, primary key look-ups are all you need

The new data stores vary quite a bit in their specific features, but in general they draw from a similar set of high-level characteristics. Not all of them meet all of these, of course, but just looking at the list gives you a sense of what they’re trying to accomplish.

  1. de-normalized, often schema-free, document storage
  2. key/value based, supporting lookups by key
  3. horizontal scaling
  4. built in replication
  5. HTTP/REST or easy to program APIs
  6. support for MapReduce style programming
  7. Eventually Consistent

And I could probably list another half a dozen qualities that many of them share too. But to me, the first two are the biggest departure form the traditional RDBMS. Of course, you can stick with MySQL and go non-relational. That’s basically what FriendFeed did, using MySQL as the back-end to distributed key/value store.

The movement to these distributed schema-free data stores has begun to use the name NoSQL. Rather than spending a lot of time on the philosophy behind NoSQL storage systems, I’d rather highlight a few that have caught my eye for one reason or another and spend a bit of time talking about makes each stand out.

Redis

I’m not going to say too much about Redis, since I recently covered it in Redis: Lightweight key/value Store That Goes the Extra Mile. I’ll just recap by saying that it’s a lightweight in-memory key/value store that handles strings, sets, and lists and has an excellent core of features for manipulating those stored data types. Redis also has built-in replication support and the ability to periodically persist the data on disk so that can survive a reboot without major data loss. Redis is still one of the newer kids on the block but version 1.0 was released a few days after I last wrote about it–Murphy’s Law, huh?

Redis is interesting to me because of the simplicity of its API and the fact that it takes the traditional key/value store up a notch by adding those specific data structures and provides fast atomic operations on them.

Tokyo Cabinet

Coming straight from Japan, Tokyo Cabinet is a fast and mature disk-based key/value store that feels a lot like BerkeleyDB re-written for the Web era. It is usually paired with Tokyo Tyrant, the network server that turns Tokyo Cabinet into a network service that speaks HTTP and the memcached protocol, as well as its own binary protocol.

Like the other modern DBM implementations, Tokyo Cabinet offers B-Tree, Hash, and fixed-size array-like record storage options. It stands out for me because, when used with Tokyo Tyrant, you have a modern network protocol and interface on top a stable low-level database infrastructure that lets you choose the right data structure to use. It is also relatively mature technology that’s in use as part of some very high-volume Web sites.

Apache CouchDB

To quote the Apache CouchDB home page:

Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.

CouchDB is written in Erlang, a robust functional programming language ideal for building concurrent distributed systems. Erlang allows for a flexible design that is easily scalable and readily extensible.

In other words, CouchDB is very buzzword compliant!

Seriously, though, CouchDB was one of the first document oriented databases really designed for the Web and to scale with the Web. The fact that it’s now under the unbrella of the Apache Software Foundation speaks to the quality of the code and the relative maturity of the project.

CouchDB appeals to me because it has always seemed to be the most futuristic of the non-relational data stores I’ve looked at. It’s written in a language designed for large, concurrent, and reliable network systems. But it also gives you the ability to tap into Map/Reduce style processing using server-side JavaScript. It sounds a little crazy (or did when it first began), but it actually works quite well. The API is simple and well thought out, which provides a very low barrier to entry too. It’s also easy to have a CouchDB instance on your desktop or laptop that you can develop with and then sync to a CouchDB server in “the cloud” or in your company’s data center.

CouchDB also implements a very useful versioning scheme that makes it possible to build collaborative systems without having to re-invent yet another wheel.

Riak

The newest data store on my radar, Riak bills itself as a “document oriented web database” that “combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.” It uses the eventual consistency model in an effort to maximize availability in times of network or server outages. In fact, one of the most interesting features of Riak is that you can easily control the parameters that define how available your system is in the face of problems.

Those parameters come from Eric Brewer’s CAP theorem (a good read) which says that the three ingredients we should care about are varying degrees of Consistency, Availability, and Partition Tolerance. Riak, unlike other systems, doesn’t force you into a specific set of CAP values. Instead, it allows you to decide on a per-request basis how stringent you’d like to be.

This control comes thanks to three variables: N, R, and W. In a distributed system, N is the number of replicas in the system. So if you write a new key/value pair with N set to 4, then 4 nodes will be expected receive a copy of it (eventually). This value is set on a per-bucket basis.

The R and W values are set on a per-request basis and control the number of nodes that the client must receive a response from to complete a successful read or write operation, R fo read and W for write. This provides very granular control over how clients can react to failed nodes in a cluster.

For good high-level overview, see this presentation from the recent NoSQL Conference in New York City.

Others

There are a surprising number of NoSQL systems available today. The ones I’ve noted so far have caught my eye in one way or another and each uses a different approach to providing an alternative to centralized relational database systems. There are a few others that I’m hoping to experiment with as time allows.

  • MongoDB is a VC-backed distributed schema-free database with a very impressive feature set (including additional indexes), commercial support, and mosty hands-off operation.
  • Project Voldemort is a fairly mature system that’s in heavy use at LinkedIn and comes with automatic replication and partitioning.
  • MemcacheDB combines the proven BerkeleyDB storage system with a network server that speaks the memcached network protocol so you can create memcahed-like nodes that hold more data than would fit in a traditional RAM-based memcached nodes and be assured that the data is not lost after a reboot. In some respects this is similar to the Tokyo Cabinet and Tokyo Tyrant combination.

Have you dipped your toe in the NoSQL waters yet? How did it go?

Comments on "NoSQL: Distributed and Scalable Non-Relational Database Systems"

Very few web sites that occur to become detailed beneath, from our point of view are undoubtedly properly worth checking out.

The facts mentioned inside the report are a few of the most beneficial readily available.

Check below, are some entirely unrelated websites to ours, even so, they are most trustworthy sources that we use.

Here are several of the web pages we advocate for our visitors.

Every the moment inside a while we choose blogs that we study. Listed below are the newest websites that we select.

The info mentioned within the post are a few of the ideal obtainable.

Please check out the internet sites we stick to, including this a single, as it represents our picks from the web.

Here are some links to sites that we link to due to the fact we think they may be really worth visiting.

Usually posts some very interesting stuff like this. If you are new to this site.

We prefer to honor lots of other world wide web internet sites on the web, even though they aren?t linked to us, by linking to them. Below are some webpages worth checking out.

Always a massive fan of linking to bloggers that I enjoy but do not get a great deal of link really like from.

Wonderful story, reckoned we could combine a number of unrelated information, nonetheless genuinely really worth taking a appear, whoa did a single find out about Mid East has got more problerms as well.

Every the moment in a even though we opt for blogs that we read. Listed beneath would be the newest web sites that we decide on.

Here is a good Weblog You might Obtain Fascinating that we encourage you to visit.

Always a major fan of linking to bloggers that I really like but do not get a great deal of link enjoy from.

That is the finish of this report. Right here you will uncover some websites that we consider you?ll enjoy, just click the hyperlinks.

One of our guests recently encouraged the following website.

Please pay a visit to the web sites we adhere to, including this one, as it represents our picks through the web.

Here are several of the web-sites we advise for our visitors.

This is a topic close to my heart cheers, where are your contact details though?

We prefer to honor quite a few other world-wide-web web pages around the web, even when they aren?t linked to us, by linking to them. Underneath are some webpages really worth checking out.

Just beneath, are quite a few totally not related web pages to ours, nevertheless, they’re certainly really worth going over.

The info talked about in the post are several of the best available.

Always a big fan of linking to bloggers that I really like but really don’t get quite a bit of link adore from.

Although internet sites we backlink to beneath are considerably not associated to ours, we feel they’re basically really worth a go by means of, so have a look.

Leave a Reply