Data By The Numbers

When dealing with large distributed systems, knowing some basic performance and failure numbers helps you understand what you can reasonably expect both in terms of performance and reliability.

Recently, Google Fellow Jeff Dean gave a presentation at the LADIS 2009 (Large Scale Distributed Systems and Middleware) Conference. As with some of his previous talks, this one did an excellent job of revealing a bit more about how Google builds systems and what they’ve learned in doing so.

Working at the scale they do (thousands and thousands of servers) is especially interesting because you have to really step back and change your thinking about the available building blocks and how they can best fit together to get the job done. And to do so it helps to have a good high level mental model of the pieces and their performance. In Google’s case, this means having 40-80 servers (4-8 cores, 16GB RAM, 2TB disk) in a rack all connected via a gigabit ethernet switch. Multiple (30+) racks then uplink to a another switch to form a cluster. And services are designed to run across multiple clusters in different data centers around the world.

Design for Failure

Jeff discussed their experience with a typical cluster in its first year and presented some sobering downtime and failure statistics. Here’s some of what he presented (all numbers approximate):

  • 1 PDU failure causing 500-1,000 machines to suddenly vanish for 6 hours
  • 1 network re-wiring project which entails a rolling 5% of machines going offline in a 2 day period
  • 20 rack failures, 40-80 machines gone for 1-6 hours
  • 5 racks “go wonky” meaning the start seeing up to 50% packet loss
  • 8 network maintenance windows, half of which results in 30 minutes of random connectivity loss
  • 12 router reloads which kill DNS and forward-facing VIPs for a few minutes
  • 3 router failures that take down some traffic for an hour or so
  • dozens of 30 second “blips” for DNS
  • 1,000 individual machine failures
  • thousands of hard drive failures
  • slow disks, bad memory, misconfigured or flaky machines, etc.


Now I don’t work at anything near that scale, but I can tell you that I think those numbers scale down pretty well. If you only have 1/4 of a “google cluster” in your infrastructure, you probably will see about 25% of those ballpark numbers in your own experience. Once you have more than a few dozen systems involved, things just seem to break all the time!

Based on my experience, working in an environment like that can be downright maddening–especially when you network is acting up. You quickly learn that the keys to your sanity are redundancy and removing as many Single Points of Failure (SPoF) as possible. See The Curious Case of the Failing Connections and The Curious Case of the Failing Connections, Part 2 for an example of unexpected performance problems.

Relaly, all of this argues for building distributed systems that are as independent as possible and can tolerate failures, even if means giving users partial functionality (because it’s better than no functionality). But figuring out how t design and build scalable distributed systems isn’t easy either. Your first attempt or two may not be quite right. There’s a good chance you’ll underestimate the cost of some operation or another.

Numbers Everyone Should Know

With that in mind, Jeff presented a chart of “Numbers Everyone Should Know” that helps put various operations into perspective.

L1 cache reference                             0.5 ns
Branch mispredict                              5 ns
L2 cache reference                             7 ns
Mutex lock/unlock                             25 ns
Main memory reference                        100 ns
Compress 1K bytes with Zippy               3,000 ns
Send 2K bytes over 1 Gbps network         20,000 ns
Read 1 MB sequentially from memory       250,000 ns
Round trip within same datacenter        500,000 ns
Disk seek                             10,000,000 ns
Read 1 MB sequentially from disk      20,000,000 ns
Send packet CA->Netherlands->CA      150,000,000 ns

The idea is to facilitate back of the envelope estimates of system performance. He argues that a critical skill is the ability to estimate the performance of a system without having to actually build it. And that’s where those numbers come into play.

To make talking about these easier, let’s pick a few familiar points of reference. If you spend any time playing with ping or looking at specs for hard disks, you’re probably used to thinking in milliseconds rather than nanoseconds. A millisecond is 1,000,000 nanoseconds. So the “round trip within same datacenter” timing above is 0.5ms, which sounds about right to me.

The exact numbers here aren’t as important as the differences in magnitude as you move up and down the list.

Let’s walk through an example: deciding between a distributed in-memory key/value store (like Redis) or a disk-based system like Berkeley DB is something you can quantify.

If we assume that the data set is sufficiently larger than the available RAM on your server, then most Berkeley DB accesses will have to hit disk and probably result in a few seeks (we’ll ignore the details of hash table vs. B-Tree for now). Let’s call that 3 seeks per request, which is 10,000,000ns (or 10ms) each for a total of 30ms.

However, if you’re using a distributed in-memory key/value store, you need to fetch data from another machine in your cluster. Let’s call that a few 500,000ns (or 0.5ms) round-trips in the same datacenter (assuming persistent network connections and a simple request/response protocol) and the same 3 “seeks” as before, but this time they’re 100ns main memory references on the remote machine. Now we’re looking at a total more like 2-3ms.

Using this estimation technique, you can see that the distributed in-memory key/value store is easily 10 times faster than using a disk-based solution.

Of course, there’s a critical piece of information missing from these numbers: cost. And that cost comes in two flavors. The first is the cost associated with buying enough machines to keep all the data in memory and the network gear that allows them all to talk to each other. The second “cost” is the complexity associated with having 10 servers instead of 1 server. As we saw earlier, when the number of servers grows, the odds of failures rise as well.

So you need to decide how you needs map onto the spectrum of performance and cost/complexity options. If it’s a requirement to serve N requests per second, then you can scope out what that will take given various amounts of data.


It’s fun to design big complicated systems, but it’s no fun at all to support them when they’re not performing or are too fragile because they break all the time. By knowing some basic rules of thumb in advance, you can design with the right expectations in mind and spend a lot less time scratching your head (or pulling your hair out).

Are there are performance or failure estimation tips you regularly use? What are they?

Comments on "Data By The Numbers"

Im thankful for the blog.Really looking forward to read more. Great.

Hey, thanks for the blog article. Much obliged.

Some genuinely excellent articles on this website , thanks for contribution.

tCCZCR online dating websites This actually answered my problem, thanks!

This is one awesome blog.Much thanks again. Great.

http://www.royalresidences.net Submission: StrategyThe distribution strategy t2india is focused on the prospective market each nationallyand internationally. Secondarily, t2india seeks to establish its presence in thecommunity using the media.
http://www.healthyingredients.net The distribution: StrategyThe distribution strategy t2india is focused on the prospective market simultaneously nationallyand internationally. Secondarily, t2india seeks to establish its presence in thecommunity about the media.

http://www.connectfinance.net One company is private equity firm, Huntsman Gay Global Capital, who announced on the 30th of September they’ve made a majority investment in iQor Holdings Inc. The investment simply by Huntsman Gay in iQor, the second largest accounts receivable services provider on earth, will allow iQor to increase its share in the accounts receivable (A/R) outsourcing techniques market, while introducing Huntsman Gay into the A/R outsourcing market.
http://www.homesmarthome.org While Everest have predicted the finance and accounting outsourcing (FAO) market accurately up to now, it will be interesting to see whether the market will always grow as predicted as the year nears its conclusion.
http://www.mypetshop.org When people select building their home, first thing comes to our mind is expenditure. Definitely you can save a lot of money if you develop yourself. But that process takes a lot of time, energy coming from you. If you can afford these two generously then nothing like building your house! It is just ‘out of the world’ feeling. Building your home allows you to much attached and valued since you have built it brick simply by brick.
http://www.sportzup.org Home building and home designs go in hand. It also demands absolute time and energy from you. You needs to be firm on these. Could you afford time and energy? If it is easy to, just go for it and make your dream home come authentic! Usually you expect at least 5% more on the price of your home concerning your possessions. Because builders usually showcase the competent price to be successful you people. So be very calculative in buying house. Display property, mostly will be costlier by 5%than what they have quoted well before.
http://www.travellersbay.net Division: StrategyThe distribution strategy t2india is focused on the target market at the same time nationallyand internationally. Secondarily, t2india seeks to establish its presence in thecommunity via the media.
http://www.thefinancecompany.net Travelling Perks: Obviously, one of the best things about being a home-based travel agency business owner is that you reap the discounts your self travel. The travel perks vary from organization to organization though.
http://www.businesser.net One way to ensure that you get top-notch travel perks is to sign on along with a reputable, travel-based marketing organization. As one of their agents, you grab the full benefit of being a travel agent.

“excellent issues altogether, you just won a brand new reader. What could you recommend in regards to your post that you simply made some days ago? Any positive?”

What’s up to every one, it’s actually a fastidious
for me to pay a visit this website, it consists of precious

xidLpH vzpclemrdydb, [url=http://hwrfglpqbkiq.com/]hwrfglpqbkiq[/url], [link=http://jlkjmseqbuuk.com/]jlkjmseqbuuk[/link], http://pqkbiojikyds.com/

I think the article is very helpful for people,it has solved my problem,thanks!
Wholesale 7080 a Oakley Sunglasses ID8210325

Wonderful story, reckoned we could combine a number of unrelated data, nonetheless genuinely really worth taking a look, whoa did one particular learn about Mid East has got extra problerms also.

Awesome site you have here but I was curious about if you
knew of any discussion boards that cover the same topics talked about
here? I’d really like to be a part of group where I can get advice
from other experienced people that share the same interest.
If you have any recommendations, please let me know.
Many thanks!

Look at my blog AmosHBenedix

A motivating discussion may be worth comment. There’s certainly that that you have
to write more about this issue, it may not be described as a taboo subject but usually people will not discuss these issues.
To the next! Kind regards!!

Here is my web page AdolphEHatke

Leave a Reply