Hacking with CouchDB

Working with CouchDB is very straightforward. There's virtually no setup involved and no complicated libraries to hassle with.

Last week we started looking into CouchDB, a document oriented database with many advanced features and a snowballing user base. We looked at installation on Ubuntu (trivial), high-level features, and the built-in Web interface called futon. This week we’ll look at getting some data into CouchDB and eventually play with indexing/views, and querying.

Upgrade Time

Thanks to installing the Ubuntu Netbook Remix 9.10 on my Samsung NC10 Netbook, we can look at the latest and greatest CouchDB–version 0.10. What a difference a week makes! (Oh, and UNR runs surprisingly well on this little machine!)

The main difference from what we saw last week’s output is this:

$ curl http://localhost:5984/

All I needed to do was sudo apt-get install couchdb.

Test Script

My primary development language is Perl, so I’ll show examples using Perl. We’ll make sure of the Net::CouchDb distribution from CPAN.

$ sudo cpan Net::CouchDb

That will also install a JSON module, since the client and server speak to each other using JSON documents over HTTP.

With that done, let’s whip up a simple script that connects to the local CouchDB server, creates a database, and stores a trivial document.

#!/usr/bin/perl -w

use strict;
use Net::CouchDb;
use Net::CouchDb::Document;


my $couch_db = 'test_foo';
my $cdb = Net::CouchDb->new(host => 'localhost', port => 5984) or die "$!";
$cdb->create_db($couch_db) or die "db already exists?: $!";
my $test_db = $cdb->db($couch_db);
my $record = {
    foo     => 'bar',
    message => 'hello, world!',
my $doc = Net::CouchDb::Document->new(1001, $record);
$test_db->put($doc) or die "$!";
print "OK\n";


Basically, we create a CouchDB connection object ($cdb) and then call create_db() to create a new database. In this example, the script will simply die if the database already exists. But in reality you’d do something a bit more sophisticated. Once connected, we construct a hash with the keys foo and message that we refer to as $record. We then pass that reference, along with an document id (1001) to the Net::CouchDb::Document constructor to get a document object. Then it’s just a matter of calling the put() method on our CouchDB connection object and passing in that document object.

To verify that the document actually got there, you can visit the Futon interface and click on the test_foo database. The URL should look like this: http://localhost:5984/_utils/database.html?test_foo. You’ll see a document with the id 1001 that you can click on and manipulate.

Any changes you make in the Futon interface can be committed to the database by clicking the “Save Document” link in the upper left. Doing so doesn’t actually update or replace the document. Instead if stores a new version of the document.

Load Some Data

In order to do anything interesting, we need some data to load into CouchDB. So let’s write a simple tool that can extract messages from a Unix mailbox file (mbox). We’ll treat each message as a document with multiple fields–each message header as well as the body.

Let’s install a few Perl modules to make that task easier.

$ sudo apt-get install libmail-mbox-messageparser-perl libmailtools-perl

With those installed we can use the following code, which builds on the previous example:

#!/usr/bin/perl -w

use strict;
use Net::CouchDb;
use Net::CouchDb::Document;
use Mail::Mbox::MessageParser;
use Mail::Internet;

my $db_name = 'test_mail';
my $cdb = Net::CouchDb->new(host => 'localhost', port => 5984) or die "$!";
my $mail_db = $cdb->db($db_name);

my $file_name = shift || 'mbox';
my $file_handle = new FileHandle($file_name);

my $folder_reader = Mail::Mbox::MessageParser->new({
    'file_name'    => $file_name,
    'file_handle'  => $file_handle,
    'enable_cache' => 0,
    'enable_grep'  => 1,

# skip anything before 1st message
my $prologue = $folder_reader->prologue;
print $prologue;

# read one message at a time
while(not $folder_reader->end_of_file())
    my $email = $folder_reader->read_next_email();
    my $msg = Mail::Internet->new();
    my @lines = split /\n/, $$email;

    my $body = join "\n", @{$msg->body};
    my $id   = $msg->get("Message-Id:");

    my $message = {
        Body    => $body,

    for my $field (qw[From: To: Cc: Subject:]) {
        my $value = $msg->get($field);
        if ($value) {
            $message->{$field} = $value;

    my $doc = Net::CouchDb::Document->new($id, $message);
    $mail_db->put($doc) or die "$!";


This time, we create a database named test_mail to hold our email messages. Then we use Mail::Mbox::MessageParser to parse through the mailbox file given as the first argument on the command (defaults to mbox in the current directory).

We then iterate over each message in the mailbox, using Mail::Internet->extract() to parse the message into an object from which we can extract headers and the body. We then construct a $message hash that will represent the document to store in CouchDB. We include the body and then any of the following header fields if they exist: From, To, Cc, Subject. You could easily add additional fields like User-Agent, Precedence, and so on.

Once that document is created, we use it as the basis of a Net::CouchDb::Document object that it then stored in the database.

I ran this code against an mbox file containing 46 messages as delivered by procmail and read with mutt. But it just as well could have worked against a mailbox file from Mozilla Thunderbird or Evolution.

Now, I should note that there’s a lot more we could do here. There’s little error checking, no scrubbing or normalization of the data, etc. The reality is that nowadays a lot of email is really a multipart MIME message that may contain a plain-text piece and an HTML piece (and possibly attachments for the images that make up an annoying animated signature or “stationary” background). We don’t deal with any of that. The point is to see get some data into CouchDB, not to write a fully functional email preservation tool.

See What You’ve Done?

Now is a good time to hit the Futon interface to see what you’ve done. You should see one record per message and can navigate through the set to spot-check the script: http://localhost:5984/_utils/database.html?test_mail. You should see that every record has a Body field as well as some of the others.

More To Come

So far we’ve covered the basics of CouchDB, installed it, and loaded some data in with a Perl script that extracts email messages from a traditional mbox file. Next week we’ll finish up by playing a bit with sever-side JavaScript for views and indexing.

Have you been working with CouchDB already? If so, drop a note in the comments.

Comments on "Hacking with CouchDB"


Please could someone tell me where you can get the libmail-mbox-messageparser-perl and libmailtools-perl packages for Fedora 11?



In response to almac: you need perl-Mail-Mbox-MessageParser.noarch and perl-MailTools

This perl script does not run for me as posted here, I had to make 2 changes:

Fedora release 12 (Constantine)
perl 5.10.0
Apache CouchDB 0.10.0

on line 12:

on line 34 changed to: my @lines = split /\\n/, $$email;

and on line 36 I changed the line to: $msg->extract(\\@lines);

I\’m not sure if my versions forced these changes to be needed, but doing this allows me to run the code now.


hmm line 12 did not print. Basically, it\’s missing a semicolon at the end of the line.


Wonderful publish, very informative. I wonder why the other specialists of this sector do not notice this. You should continue your writing. I am confident, you’ve a great readers’ base already!|What’s Going down i’m new to this, I stumbled upon this I’ve discovered It positively useful and it has helped me out loads. I am hoping to give a contribution & aid other users like its aided me. Good job.

I just like the helpful information you provide to your articles. I will bookmark your blog and take a look at once more here regularly. I am quite certain I will be told plenty of new stuff proper here! Best of luck for the following!

Very good article post.Really looking forward to read more. Great.

Thanks for sharing such a good thought, article is fastidious, thats why i have read it entirely

Here is a great Weblog You may Come across Fascinating that we encourage you to visit.

Hi, i think which i saw you visited my website so i arrived at ?return the favor?.I’m trying to find points to improve my internet site!I suppose its ok to work with a number of your opinions!!

Also visit my web page … YuKStaheli

aG5ZjS ftagaupnrmeh, [url=http://ighzrojaixwa.com/]ighzrojaixwa[/url], [link=http://dpmrilykyxvh.com/]dpmrilykyxvh[/link], http://bsbtcdhihetu.com/

Tremendous things here. I am very happy to look your article.
Thanks so much and I’m looking forward to contact you.
Will you please drop me a e-mail?

Also visit my website :: DebraYRousso

I do believe the admin with this site is really making an effort in favour of his web site,
for the reason that here every data is quality based information.

My web blog; AdrianTHyten

One of our visitors not long ago suggested the following website.

Leave a Reply