Parallel Gedanken

HPC's version of the tortoise and hare.

Sit down, take a deep breath and relax. Let your mind wander for a bit. I’m going to talk about something that runs counter to one of the basic tenants of computer performance. Something you may believe to be true for ever and always. Last time I talked about how concurrent parts of a program are those that can be computed independently. I also explained how the parallel parts of a program are those concurrent parts that should be executed at the same time in a way to increase the speed of a program. As I discussed, the differentiator is overhead. It is now time to take the next step. By the end of this column, I hope to convince you that the following is sometimes true:

In a parallel computer, faster processors are not always better than slower processors.

This statement is kind of a trick because it makes no mention of how the processors are connected. And, that is the issue. The interconnect contributes to the overhead and thus the scalability — or how many processors you can add to your program before it stops getting faster. Scalability is also determined by Amdahl’s Law, which states that the overall performance of your parallel program will be determined by the amount of serial content. That is, the parts that cannot be executed in parallel. These parts are eventually going to limit how fast your program can run. We will come back to that later, but for now let’s talk about scaling parallel parts.

Lets use a simple Gedanken-experiment (thought experiment) to see how slower processors may help in some situations. Assume we have a program that has a large concurrent loop with a certain amount of overhead. Let’s assume we have eight very fast processors and a medium speed network (i.e. the network is slower than the processors ability to eat and crunch on data). What will happen with our program? It doesn’t take too much Gedanken to figure out that the network is going to limit the ability of the shinny new processors to do work (i.e. they will be waiting on the data). At some point, our program will not stop scaling and we will see less than an 8X speed up.

In a second experiment, let’s replace those eight processors with thirty two processors that are 4 times slower, but are matched to the medium speed network (i.e. the processors do not wait on any data). In this case, our program scales perfectly and we will have reduced our single processor time by 32. Since our processors are now 4 times slower, the time for our 32 processors would be equal to an 8X scaling for our 8 fast processors. Recall, we were unable to get an 8X scaling with fast processors. QED

If you understand this concept, you now understand why the interconnect is so important for many cluster applications. And why many building clusters spend good money for a fast interconnect. Feeding processors is important. The latest and fastest processors need the latest and fastest interconnects. Unfortunately, the rate at which processors increase in speed, often exceeds the available interconnects. Multi-core has changed this a bit, but the issue is still the same. Messaging rate and throughput have become important because there are multiple cores sharing the interconnect.

The “slow is better” approach is also employed in two commercial supercomputers. The venerable IBM Blue Gene, which consistently beats up any contender for the Top 500 spot, uses 700 MHz Power PC 440 processors (plus a floating point unit) and fast balanced interconnect. Another balanced system approach is from SiCortex which uses 5832 500MHz MIPS processors and a very fast balanced interconnect. The key word is balance. There is also a huge reduction in energy costs for these two systems. Something to keep in mind in the age of green.

Of course, applications vary and some applications are not as sensitive to the interconnect speed and do see benefits from using faster processors. Multi-core will certainly change this behavior, which is why benchmarking is important. As often quoted on the Beowulf Mailing list, “It all depends on the application. Benchmarking your application(s) is the best measure. YMMV (Your Mileage May Vary).” Words well worth remembering.

I will conclude this discussion with one more data point which comes to you first hand. Back in the day, a co-worker of mine, Anatoly Dedkov, and I were intrigued by this idea as it applies to disk based I/O because getting data off of spinning disks always seems to keep processors waiting. We actually wrote and presented a paper at the Parallel and Distributed Processing Techniques and Applications in 1995 (PDPTA’95). We shamelessly proposed the Eadline-Dedkov Law which states:

For two given parallel computers with the same cumulative CPU performance index, the one which has slower processors (and probably a correspondingly slower inter-processor communication network) will have better performance for I/O-dominant applications.

The whole idea struck us as counter-intuitive at first, which is why we investigated it further. Next time I’ll talk more about Amdahl’s law. In the mean time, please feel free to enlighten your colleagues about the slow processors idea. Makes for great heated arguments and fist fights.

Comments on "Parallel Gedanken"


So basically, this means that the “traffic jam shockwave” (youtube it) also happens in HPC… even if the acceleration is digital! So I can only deduce that the predictive and flock systems that are being developed for computer driven vehicles, will also be used in HPC in a not so far future! Or vice-versa even!!
Uhm… I guess hierarchical management agents might get to be introduced, instead of a single master in the cluster… thus relieving the interconnects with alternating data transfers…
Nonetheless, one thing that can help in situations of cores being faster than the interconnects, is to use “real-time” data compression… I hope this is an hypothesis that can be taken into account!! Since you’ve got CPU to spend… think compact! It’s not the ideal solution, but saves a bundle in money while you can’t get the new “hyper-speed” network, and the current one was just a “bad buy”.
But hey, what do I know, I’m just a newbie in HPC!


Nice SiCortex Advertisement, Doug.


Traffic jam shockwaves are pretty cool.

Wonderful website you have here but I was wanting to know if you knew of any discussion boards that cover the same topics discussed here? I’d really like to be a part of group where I can get feed-back from other knowledgeable people that share the same interest. If you have any recommendations, please let me know. Bless you!

I must get across my respect for your kindness for people that should have help on your subject matter. Your very own dedication to passing the message all over was surprisingly good and has continuously encouraged others like me to attain their pursuits. Your own helpful advice denotes a great deal to me and somewhat more to my peers. Regards; from all of us.

I’m extremely inspired with your writing talents and also with the structure to your weblog. Is this a paid subject matter or did you customize it your self? Anyway stay up the nice quality writing, it’s uncommon to look a great weblog like this one these days..

hMRobW eyjjmjyvcask, [url=http://mkltshqivdtd.com/]mkltshqivdtd[/url], [link=http://vaqnizgzayla.com/]vaqnizgzayla[/link], http://stysqgvqkeze.com/

Leave a Reply