Just like there are "Urban Legends" that never seem to die, so it seems there are "Cluster Urban Legends" that persist even today. We have all seen or heard them. As a service to those entering the cluster HPC (High Performance Computing) community, we dispel some of the more popular tales. (Read: misconceptions.)
Just like there are “urban legends” that never seem to die, so it seems there are “cluster urban legends” that persist even today. We’ve all seen or heard them — wacky things people say about HPC clusters. As a service to those entering the cluster HPC (High Performance Computing) community, I’ve decided to dispel some of my personal favorites. Hopefully, these legends (misconceptions) will eventually fade, but then again, this is the Internet Age. Pay particular attention if you considering building or engineering a cluster. Understanding these key concepts is the first step.
1. Can you imagine a Beowulf cluster of these?
This comment/joke is seen often on Slashdot when there is some new processor, computer, video game, cluster, or whatever. My answer, of course is, “No, not really.” Just connecting things does not make a usable cluster.
At the beginning and end of the day, HPC is all about price-to-performance. Whatever you connect must make sense. Connecting a bunch of old Pentium II systems to equal the performance of a single Opteron may make sense if you are building a rendering farm, but may be a bad idea if you want to calculate the weather for next week.
So what is a cluster? Let’s define it as a collection of workers that communicate to produce a large amount of work. In computer terms, it’s computer hardware connected with a some form of communication medium (Gigabit Ethernet for example).
So here is the first thing to remember:
So, yes, imagining a cluster of smart washing machines may make economic sense for someone, but for most HPC people, it won’t be too effective.
By the way, clustering is an old idea. Any time you have more than one thing working together to produce something, you are clustering. Ants do it quite well. Go ahead, say it, Can you imagine … of ant hills?”
2. This software allows me to connect all the desktops in my company to create an unlimited supply of supercomputing power.
This statement has appeared in the press quite a bit. It seems every time someone talks about clustering, this idea comes up. While it does have some merit, the “unlimited supply of computing power” is where the train comes off the track.
Local intranets vary quite a bit in terms of resources. Providing you can resolve the software issue, the economic justification (that price-to-performance thing) depends on what you want to do.
To help understand the dynamics, imagine playing football (American) using cell phones to communicate. Every time there is any communication on the field (play calling, huddle, snap count, referee whistle, time outs, etc.), everyone must stop and wait for the communication to finish — that is, you must dial a cell phone and call everyone who needs that specific information. The game would be interesting in a weird kind of way, but would also slow down and the cost of advertising would quickly drop because there is more time for commercials and the event just got very boring. The “price-to-performance” would be pretty low.
Now image a marathon race of ten runners, where each runner has a cell phone. Over the phone they are told to start. The delay between calls to individual runners may be on the order of minutes, but the race is on the order of hours. The small delay in the beginning will probably not influence the end result too much.
Are you getting the idea? The economics of using latent PC cycles depends on what you want to do. Things like seti@home are like marathon races, where independent runners can be sent out and return at some point. Other problems may require more communication and thus are not economically suitable. For instance, it may take that big LAN supercomputer that exists in your company network one week to compute tomorrows weather. Not very useful, unless you don’t go outside much.
Price-to-performance is determined by what you want to do with your cluster
3. The Beowulf/MPI/PVM software turns ordinary PCs into a supercomputer that will then run your programs faster.
After reading about unlimited computing power from HPC/Beowulf clusters everyone gets pretty excited. Sorry to have to bring you down, but if your program is designed to run on one processor, it will run on one processor when loaded on your shiny new cluster. Linking with an MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) libraries does not make your program run on multiple processors. Bummer.
There is no magic Beowulf software either. Get over it. Beowulf is the name of the commodity computing project that was developed by Tom Sterling and Don Becker. You can eliminate a large portion of folklore running around in your head if you read this article.
There is software from Scyld that is called Scyld Beowulf. This software does some amazing things, but it does not do any magic. There are other cluster distributions (ROCKS, OSCAR, Warewulf), but again if you are looking for magic, you have come to the wrong place.
The programming story is much larger than can be covered here. Just remember:
Programs must be designed to run on a cluster. Parallel programming can be hard.
4. Communication speed (throughput) between nodes is important.
The joke about a station wagon full of tapes being the fastest way to send large amounts of data is quite true. Intuitively, you know something is missing because we use wires and fiber instead of station wagons to send data. It is called latency.
Doug’s Latency Experiment: Next time you are in the shower, which for some of you fellow geeks may be next week, adjust the water to where it is comfortable. Now turn the hot water off and start counting. Note the time between when you turn the faucet until you scream. That time is the latency of your shower water. There is also the flush the toilet version of this experiment, but that takes two people.
So ask yourself, “How many times an hour can I make myself scream?” For those following along from home, this can be considered a thought experiment. If there was less of a delay between screams (low latency), you could increase the number of messages (screams) you can send via the shower.
So that is latency. If you have an application that needs to send a lot of messages in a short time (like a football team using cell phones) then latency is important. On the other hand if you only need to send a few messages, like the marathon runners, then latency is not that important.
Of course how much data you can send in a certain amount of time is also important. This is called throughput (or bandwidth). Throughput is a rate, measured in bits per second. Think of that station wagon full of tapes (or a Hummer full of tapes for you youngsters) moving along the highway. Latency is how long it takes to load and unload the tapes.
5. Communication speed and latency are important.
Now that you think you are so smart, throughput and latency are not the whole story. How much work a processor must do to move data is also important. If your crew that fills the Hummer with tapes is also responsible for creating the tapes, then work must be shared and the overall performance may suffer.
6. Communication speed, latency, and processor overhead are important.
Now you think you are really smart. But not so fast. Communication speed, latency, processor overhead are all important, right? It depends.
You now are a cluster expert. The answer to 90% of all cluster questions can be answered with the following:
It all depends on the application
The right hardware and software design can be very dependent on the application. Some applications (marathon types) run well on almost any cluster, while other (football types) need very high performance parts.
“Why not just use faster parts and then every application works fine?” Cost is the issue here, Skippy. When designing a cluster there are certain constraints. Cost is usually one such constraint. The old bang for the buck idea. Or, as I recall someone mentioning at some point price-to-performance.
7. The “Top 500″ list is the ultimate measure of computer performance.
Behold the Top 500 supercomputers. Got to get me one of those thingies.
The Top 500 list is a measure of how fast a specific program runs on a computer or cluster. It is a single data point. So take out a sheet of paper and put a point in the middle. That point is the performance of the world’s fastest computer running something called the “LINPACK benchmark”. Now take out another piece of paper and put a similar point in the middle. That point is the performance of the same computer on the Top 500 list running a program called BLAST. Hold the two pieces of paper next to each other. Now which one is faster. Get the idea? In programming terminology, the scope of a clusters rank in the Top 500 is limited to the Top 500. When running other programs Your Mileage May Vary (YMMV).
Here is the last thing to remember.
HPC/Beowulf clusters are about building machines around problems.
If you need a fast interconnect, then buy it. If you don’t, then don’t buy it and buy more processors instead. Maximize your performance and buy what you need to solve your problem. This process does require more engineering on the end-users part, however. Before you build, some time spent on the design process will ultimately provide a good cluster experience. More to the point, the satisfaction that you have implemented a real working HPC cluster is certainly better than knowing that all you created was a large space heater.
In conclusion, you should now be able to totally discount all those cluster urban legends you see all the time. And, if you’ve already forgotten the things you were supposed to remember, here is one last sentence to sum it all up:
There is no free lunch with clusters — just a reasonable priced buffet on which to feed your computing needs.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.