Introducing MongoDB

Relational databases are not necessarily the best choice for modern Web applications. A new generation of alternatives is emerging. MongoDB is one of the best emergent solutions. And it's open source.

Web applications and traditional relational databases are nearing an end to their tumultuous relationship. For over a decade now, most Web applications have been built on top of relational databases, with various layers of indirection to simplify coding and boost the productivity of developers. For every Web programming language, there are any number of object-relational mapping (ORM) choices, each with pros and cons, yet none good enough that a developer can forget about SQL or ignore protecting the database. Moreover, as Web applications grow more complicated and sites need to be created faster, adapt instantly, and scale massively, these old solutions are no longer satisfying the demands of the Web.

There are a number of different projects working on new database technologies, all of which forego the stalwart relational model. Relational databases are difficult to scale, largely because distributed joins are difficult to perform efficiently. Further, mapping from the many popular dynamically-typed languages to SQL is complicated, inefficient, and time consuming. While often called the “NoSQL” movement, the need for new technologies is caused by the relational model, rather than SQL.

Beyond the relational model, there are a number of data model choices: key-value stores, tabular databases, graph databases, and document databases. Key-value stores are simple and easy to scale. If all you need is get and put access, a key-value store works great. However, most applications need more features, including secondary indexes, dynamic queries and sorting. Tabular or column databases can also scale, but don’t offer any improvement when mapping to programming languages. Document databases can be scaled and do in fact map well to programming languages.

MongoDB (from “humongous”) is a document database designed to be easy to work with, fast, and very scalable. It was also designed to be ideal for website infrastructure. It is perfect for user profiles, sessions, product information, and all forms of Web content (blogs, wikis, comments, messages, and more). It’s not great for transactions or perfect durability, as required by something like a banking system. Good fits for MongoDB include applications with complex objects or real-time reporting requirements, and agile projects where the underlying database schema changes often. MongoDB does not suit software with complex (multiple object) transactions.

Inside MongoDB

MongoDB stores BSON, essentially a JSON document in an efficient binary representation with more data types. BSON documents readily persist many data structures, including maps, structs, associative arrays, and objects in any dynamic language. Using MongoDB and BSON, you can just store your class data directly in the database, removing a whole slew of problems. MongoDB is also schema-free. You can add fields whenever you need to without performing an expensive change on your database. Adding fields is also quick, which is ideal for agile environments. You need not revise schemas from staging to production or have scripts ready roll changes back.

MongoDB uses a custom network protocol for client/server chatter. While this approach is far faster than REST, it demands a native driver for each programming language. Currently, production drivers exist for Python, Ruby, PHP, C++, Java, and Perl, and drivers for Erlang, Factor, and C# are in the works.

MongoDB has an auto-sharding implementation in alpha. Just specify a shard key for a collection and scale horizontally as much as you need to.

MongoDB is an interesting combination of modern Web usage semantics and proven database techniques. In some ways MongoDB is closer to MySQL than to other so-called “NoSQL” databases: It has a query optimizer, ad-hoc queries, and a custom network layer. It also lets you organize document into collections, similar to sql tables, for speed, efficiency, and organization.

To get great performance and horizontal scalability however, MongoDB gives something up: transactions. MongoDB does not support transactions that span multiple collections. You can do atomic operations on a single object, but you can’t modify objects from two collections atomically.

Internally, MongoDB uses memory mapped files to store data on disk. This keeps the main database code clean and simple, but causes some problems on 32-bit platforms. If you run MongoDB on a 32-bit system, you can only store about 2 GB of data. 64-bit systems are effectively unbounded.

Versions of MongoDB exist for Linux, Mac OS X, Windows, Solaris, and Free BSD. See http://www.mongodb.org/display/DOCS/Downloads for all platform downloads.

Running MongoDB

Lets download and run MongoDB, and march through some simple examples. To install MongoDB, connect to its project host and download the latest binaries for your system. Unpack the tarball, create a directory for the databases, and launch the daemon and workspace.

$ curl -O http://downloads.mongodb.org/linux/mongodb-linux-x86_64-1.0.0.tgz
$ tar zxvf mongodb-linux-x86_64-1.0.0.tgz
$ cd mongodb-linux-x86_64-1.0.0
$ mkdir /data/
$ mkdir /data/db
$ ./bin/mongod &
$ ./bin/mongo

The latter command, ./bin/mongo, is the MongoDB shell. It’s based on JavaScript, so you can use any JavaScript that you want.

// lets insert some data
> db.foo.save( { "name" : "Bob" } )
> db.foo.save( { "name" : "Joe" } )

// now lets retrieve it
> db.foo.find()
{"_id" :  ObjectId( "4a6e88194fb6b7661627ad47")  , "name" : "Bob"}
{"_id" :  ObjectId( "4a6e881d4fb6b7661627ad48")  , "name" : "Joe"}
// you can get a list of all the collections

> show collections
> db.foo.save( { "name" : "Lisa" } )

// this is a dynamic query matching all documents where "name" is "Joe"
> db.foo.find( { "name" : "Joe" } )
{"_id" :  ObjectId( "4a6e881d4fb6b7661627ad48")  , "name" : "Joe"}

// you can use .explain() to see how a query will be executed.
// In this case, since there are no indexes, a table scan is necessary
> db.foo.find( { "name" : "Joe" } ).explain()
{"cursor" : "BasicCursor" , "startKey" : {} , "endKey" : {} ,
"nscanned" : 3 , "n" : 1 , "millis" : 0 ,
"oldPlan" : {"cursor" : "BasicCursor" , "startKey" : {} , "endKey" : {}} ,
"allPlans" : [{"cursor" : "BasicCursor" , "startKey" : {} , "endKey" : {}}]}

// lets create an index on name
> db.foo.ensureIndex( { "name" : 1 } )

// now when we do the explain, we'll see that we only looked at 1 object
> db.foo.find( { "name" : "Joe" } ).explain()
{"cursor" : "BtreeCursor name_1" , "startKey" : {"name" : "Joe"} , "endKey" : {"name" : "Joe"} , "nscanned" : 1 , "n" : 1 , "millis" : 0 , "allPlans" : [{"cursor" : "BtreeCursor name_1" , "startKey" : {"name" : "Joe"} , "endKey" : {"name" : "Joe"}}]}

// lets delete something
db.foo.remove( { "name" : "Joe" } )

For more developer documentation, see http://www.mongodb.org/display/DOCS/Developer+Zone.

Thinking in Documents

Designing a schema for a document database is very different than for a relational database. In general, the schema is very similar to an object you might have in your code, rather than a mapping of that object to a table.

A big question that comes up is when to embed a document in a parent document and when to link to it and have it stored in a seperate collection. For starters, embedding is great when you have a one to many relationship.

In MongoDB, like in a traditional RDBMs, you must choose how many indexes to have and what fields to index. MongoDB has a novel query optimizer that protects against worst case performance. Instead of relying on statistics which can change over time, it occasionally tries different plans in parallel, stopping when it is finished, and remembers the best index to use.

Compound indexes are also similar to those in a relational database, and have similar properties. For example if you have two fields you often query together, say last and first name, you may want an index on { last : 1 , first : 1 }. However, as long as the last name is first in the index specification, you can also query quickly on only the last name. Queries on first name only would not optimized via an index unless you specifically created an index on that field. This mirrors traditional relational systems.


MongoDB replication should feel very familiar, but is a lot simpler to setup than traditional database software. If you want to start a server as a master, run mongod --master. This causes the server to keep a transaction log. To start a slave, run mongod --slave --source The slave syncs all of the data and then reads the transaction log. You can have as many slaves reading from a master as you want.

One caveat: if a slave gets too far out of sync, it has to re-clone the data. How long a slave can be down for before getting out of sync depends on the transaction log size. By default, the transaction log is the maximum of 1 GB or 5 percent of free disk space. This usually stores many hours of operations, and potentially even more depending on your use case. You can also customize size with --oplogsize.

One of the frustrating management tasks with traditional replication is handling failover in client code and then getting things back in sync. To make this easier, MongoDB offers a feature called replica pairs. This is basically a pair of servers, where one system is always master and one system is always the slave. The pair negotiate who is what at startup, and then guarantee only one is master at any given time. A forthcoming feature allows a pair to have multiple slaves.


MongoDB version 1.0 has an early implementation of sharding. If you decide to shard a collection all you have to do is specify a shard key. Sharding is order preserving, so records with close shard keys will likely be on the same shard. For example, an e-commerce application might choose to shard based on user ID. Queries and sorts within a shard are very fast.

Queries and sorts across shards also work. You can sort by a non-shard key, and the system will do a merge sort of the results. This works well for two to twenty nodes or so; a thousand nodes might slow down too much.

Sharding consists of multiple regular database instances (mongod) and any number of sharding processes (mongos). mongos is basically a database router. Every request goes through mongos; it decides how many and which mongods should receive the query. mongos collates the results, and sends it back to the client. You can have one mongos for the whole system no matter how many mongods you have, or you can have one local mongos for every client if you wanted to minimize network latency.

Go Mongo!

MongoDB offers many advantages over more traditional databases. It is easy to install, simple to manage, and fast. It efficiently stores binary files and complex data models. It is also resilient: because it’s schema-free, changes can made be made instantaneously and one record need not be identical to another. MongoDB is being used in production now and is ready for you to explore.

Comments on "Introducing MongoDB"


\”Relational databases are not necessarily the best choice for… A new generation of alternatives is emerging.\”

The last time I recall hearing how RDB\’s were not the best choice for something, it was when OODB\’s were the \”new generation\”.

So thanks for the article; and it brings a couple questions to mind:

1.) How is Mongo different from the last generation of OODB\’s?

2.) Storing documents in a DB… Is Mongo doing something better and different than Microsoft did with storing documents in an Exchange database? (When Exchange first came out it was sold as the place to put company documents, so everyone could collaborate on them. But administratively it wasn\’t an improvement to have to deal with the resulting DB files, versus an old-fashioned file system.)


That was just like lotus notes, but in open source. In fact, if all these big corporation release what they have buried to open source. We don\’t have to re-invent the wheel over and over again every 10 years.



MongoDB isn\’t really an OODB – it makes no attempt to store actual class instances w/ methods, etc, but stores \”documents\”, which are JSON like structures. What\’s really nice about this is that it is a very natural mapping to datatypes that are already present in most languages, like Python dictionaries and Ruby hashes (and JS objects).

We use the word \”document\” to refer to the structures stored in the database, but it isn\’t really meant to mean a document like a file on the filesystem (although MongoDB can handle storing large files – check out GridFS). When we say \”document\” we just mean this JSON-like structure, which can be more complex than a key, value pair or even a row in a traditional RDBMS (i.e. documents can embed other documents, etc.)


\”While often called the NoSQL movement, the need for new technologies is caused by the relational model, rather than SQL.\”

Ugh. I guess it doesn\’t matter how illustrious the coder is, he/she can still be full of misconceptions and inaccuracies on the basics.

The Relational Model is a sound mathematical concept bastardized to different amounts by most current SQL DBMSes. Whenever I use some NoSQL database, I find them being creative in the physical implementation (and there is also lots of creativity in SQL DBMSes at the physical level) but they are always re-inventing the logical wheel.

Secondly, even if we grant that SQL DBMSes may \”cause the need for new technologies\”, they are also made better by different technologies, like SSDs.

Currently it looks like Expression Engine is the top blogging platform out there right now. (from what I’ve read) Is that what you’re using on your blog?

I just couldn’t depart your web site prior to suggesting that I really enjoyed the standard information a person provide for your visitors? Is gonna be back often to check up on new posts

Valuable information. Lucky me I found your web site by accident, and I’m shocked why this accident did not happened earlier! I bookmarked it.

Together with every thing which seems to be building within this particular area, your opinions tend to be relatively radical. Even so, I am sorry, because I can not give credence to your whole theory, all be it stimulating none the less. It would seem to me that your opinions are generally not entirely justified and in reality you are generally yourself not even totally certain of your assertion. In any case I did enjoy reading it.

I just like the valuable info you provide for your articles. I’ll bookmark your blog and test once more here regularly. I am slightly sure I’ll be informed many new stuff proper here! Good luck for the next!

Hiya, I’m really glad I’ve found this info. Nowadays bloggers publish just about gossips and internet and this is actually irritating. A good blog with interesting content, this is what I need. Thanks for keeping this site, I’ll be visiting it. Do you do newsletters? Cant find it.

I have learn some good stuff here. Certainly worth bookmarking for revisiting. I wonder how much effort you place to create this sort of magnificent informative website.

Hello my family member! I wish to say that this post is awesome, great written and include approximately all important infos. I would like to peer more posts like this .

What i don’t realize is actually how you’re not really a lot more well-liked than you may be right now. You’re very intelligent. You understand thus considerably in the case of this topic, made me in my opinion believe it from so many various angles. Its like men and women are not interested unless it is one thing to accomplish with Woman gaga! Your individual stuffs nice. All the time care for it up!

This is a very good tips especially to those new to blogosphere, brief and accurate information… Thanks for sharing this one. A must read article.

We prefer to honor lots of other world wide web sites on the internet, even when they aren?t linked to us, by linking to them. Underneath are some webpages worth checking out.

You are my breathing in, I own few web logs and often run out from to brand : (.

Very couple of sites that transpire to become comprehensive below, from our point of view are undoubtedly nicely really worth checking out.

Please go to the web-sites we adhere to, which includes this one particular, because it represents our picks from the web.

We like to honor many other world wide web websites on the internet, even if they aren?t linked to us, by linking to them. Under are some webpages really worth checking out.

Always a big fan of linking to bloggers that I appreciate but really don’t get lots of link really like from.

That may be the end of this post. Here you?ll locate some web-sites that we think you will value, just click the hyperlinks.

Surprisingly post,it is useful to me and others,please just keep it on….
Wholesale Oakley dispatch ii sunglasses yellow black iridium outlet http://www.fleetsale.ru/oakley-dispatch-ii-sunglasses-051.html

FJI2nm sbnzpjkoaqpc, [url=http://ymiolzuxzdfv.com/]ymiolzuxzdfv[/url], [link=http://gtvsdgkmsjyh.com/]gtvsdgkmsjyh[/link], http://qpjtjancvtmx.com/

Do you have a spam problem on this site; I also am a blogger, and I was wondering your situation; many of us have developed some nice procedures and we are looking to trade techniques with other folks, why not shoot me an e-mail if interested.

You have to be a part of a competition first of the
highest quality blogs on the internet. I am going to recommend
this website!

Also visit my blog; MagenWWarrix

Usually posts some really exciting stuff like this. If you are new to this site.

Although internet sites we backlink to below are considerably not connected to ours, we feel they’re basically really worth a go by way of, so possess a look.

Usually posts some very intriguing stuff like this. If you?re new to this site.

Although web-sites we backlink to beneath are considerably not related to ours, we feel they’re in fact really worth a go as a result of, so possess a look.

Leave a Reply