Consistent Hashing for Scaling Out

Consistent Hashing is a useful technique for horizontal scaling while also protecting yourself against future growth as well as server failures.

A while ago I read about consistent hashing as way of helping to mitigate some of the problems associated with horizontal scaling of services, especially when you need to grow or shrink clusters. But at the time I didn’t really have a reason to use consistent hashing, so I didn’t dig too deeply into how it works or how I might use it. Recently I turned to consistent hashing to the deployment of a new service based on Redis (see Redis: Lightweight key/value Store that Goes the Extra Mile) and thought it would be worth writing about consistent hashing in a bit more detail.


Before we jump into consistent hashing, let’s first consider the problem it tries to solve. When you want to scale beyond the capabilities of a single server service, be it MySQL, an in-memory cache, or anything else, you need a mechanism for randomly but evenly spreading data across all the servers (or clusters). You’d typically accomplish this by running the “key” (the thing that identifies the piece of data) through a hash function and using that value to determine which server owns the data.

For example, you want to fetch object 4355623 from one of your 10 servers. So you run that value through a hash function which maps it to server #8. A good hash function will have consistently fast performance and a fairly even distribution of hashed keys–even similar ones. In other words, keys like foobar1@gmail.com, foobar2@gmail.com, and so on are not going to all end up on the same server.

The implementation of such a hash function is often pretty straightforward. Take the given key, convert it to a string, run that through a well-known hash function, and map the resulting number into one of the available slots. In this case there are 10 slots. Consider the following Perl code which is a simple script you can run from the command-line to try out hashing various values and various numbers of servers:


use strict;
use warnings;
use Digest::MD5 qw(md5);

my $key = shift;
my $num_servers = shift;
my $val = unpack "N", md5($key);
my $num = $val % $num_servers;

print "$key maps to server $num\n";


Given a key as the first argument and the number of servers as the second, it will tell you which server the key maps to. So let’s do a few trial runs to see where those two email addresses map:

$ ./test_hash.pl foobar1@gmail.com 10
foobar1@gmail.com maps to server 6
$: ./test_hash.pl foobar2@gmail.com 10
foobar2@gmail.com maps to server 9

Excellent. Those two addresses land on different servers. And if you run enough addresses through that script, you’d find a pretty even distribution. That means you get roughly 10x capacity for your system by going from a single server (or single cluster) to 10. That’s what we call linear scaling, and it beats the pants off a lot of other solutions.


You could run quite well for a long time with a solution like this, happily making use of all the excess capacity to soak up growth. And, even better, if a single server goes offline for a bit, only 10% of your data is affected. But sooner or later you may need to add more servers to the service. Maybe growth has continued and you want to increase the number of servers from 10 to 15. Or maybe you’re buying new (bigger) hardware and can comfortable go from 10 servers to 5.

Unfortunately, that completely changes which server nearly every key lands on.


$ ./test_hash.pl foobar1@gmail.com 5
foobar1@gmail.com maps to server 1
$ ./test_hash.pl foobar2@gmail.com 5
foobar2@gmail.com maps to server 4

$ ./test_hash.pl foobar1@gmail.com 15
foobar1@gmail.com maps to server 1
$ ./test_hash.pl foobar2@gmail.com 15
foobar2@gmail.com maps to server 9

Well, things definitely shift around. Sure some keys will land on the same server when there are 15 as when there were 10, but that’s not the common case at all. If you had millions of keys, you’d find the bulk of them shifting around after changing the number of servers!

In other words, you end up re-hashing all the keys. That’s incredibly painful.


If you’re not an algorithms geek, reading the Wikipedia summary of consistent hashing probably leaves you with a lot of questions:

Consistent hashing is a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped. By using consistent hashing, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots.

So let’s simplify that.

The basic idea is to avoid re-hashing most keys to a new server when a server is removed or added. We do this by trying to make hash keys consistently map to the same servers in most (but obviously not all) cases. The code required to accomplish this is a bit longer than is worth pasting here, but the theory behind it is fairly easy to describe. Instead of having a one-to-one relationship between hash “slots” and servers, in consistent hashing, each server is given a larger number of slots-say 1,000 of them. So if you want to add “serverA” to the list, behind the scenes, you end up with “serverA-0001″ … “serverA-1000″ on the list. Each of those names can be mapped to a 64bit number (using a hash function, of course) and stored in a sorted list (or similar data structure). The values will be semi-randomly and semi-evenly spread out in the 64bit number space.

That means instead of 10 servers and 10 slots in a hash, we actually have 10 servers mapping to total of 10,000 slots. To map a key to a server, you run that key through the hash function, just as before, but this time time you don’t simply “mod $num_servers” to find the server. Instead, you take the resulting 64bit value and consult the sorted list. The server immediately after that number in the list is the one that is responsible for the data. So if your 64bit value was closest to the hashed value for “serverD-0532″, the server-D is your answer.

Now, here’s where the real magic happens. If you add a server to the list, that adds another 1,000 slots to the list. But instead of having to rehash all the keys, you end up with only a percentage of them moving to the new server. Similarly, if you need to remove a server from the list, you remove its 1,000 points and only a fraction of the keys and up having to map to other servers (those that lived on the removed server). Understanding Consistent Hashing provides a more visual version of this technique.

Consistent Hashing has been around as a useful technique for over a dozen years but is still underutilized in many architectures. If you’re interested in experimenting with consistent hashing in Perl, Brad Fitzpatrick’s Set::ConsistentHash module is available on the CPAN.

Have you used consistent hashing or similar techniques to scale?

Comments on "Consistent Hashing for Scaling Out"

Check beneath, are some totally unrelated sites to ours, on the other hand, they’re most trustworthy sources that we use.

Here are a number of the websites we recommend for our visitors.

Here are some hyperlinks to web-sites that we link to due to the fact we believe they are worth visiting.

Here are a number of the sites we recommend for our visitors.

Usually posts some really exciting stuff like this. If you?re new to this site.

Please take a look at the web sites we stick to, such as this a single, as it represents our picks through the web.

We came across a cool web page that you just may love. Take a appear for those who want.

Every the moment inside a even though we opt for blogs that we study. Listed below are the most up-to-date internet sites that we decide on.

After study a few of the blog posts on your website now, and I truly like your way of blogging. I bookmarked it to my bookmark website list and will be checking back soon. Pls check out my web site as well and let me know what you think.

Very few sites that take place to be comprehensive below, from our point of view are undoubtedly effectively worth checking out.

We like to honor numerous other world wide web websites around the internet, even when they aren?t linked to us, by linking to them. Below are some webpages really worth checking out.

Below you?ll uncover the link to some websites that we feel you’ll want to visit.

Check beneath, are some totally unrelated web sites to ours, nevertheless, they are most trustworthy sources that we use.

Un vehiculo Sixt le espera en las sucursales de los aeropuertos de Charles de Gaulle y Orly de Paris, en el de Luton (Inglaterra), en el Ciampino y Fiumicino de Roma, en la Terminal dos del aeropuerto de Budapest, en el de la ciudad de Atenas, Bruselas, Rotterdam y alquiler de coches baratos en madrid aeropuerto una inacabable lista de ciudades que puede preguntar en la propia pagina de Sixt.

Always a massive fan of linking to bloggers that I like but really don’t get a great deal of link really like from.

Just beneath, are several completely not related internet sites to ours, even so, they may be certainly worth going over.

Always a major fan of linking to bloggers that I enjoy but really don’t get lots of link love from.

Leave a Reply