dcsimg

Highly Parallel HPC: Ants vs Horses

Combing things in parallel often has unexpected consequences and outcomes.

Outside of geometry, the word parallel takes on many meanings. The term is often used to indicate “two or more things used in the same way at the same time.” I remember my first experience with circuit analysis. We learned that total resistance for resistors in series is a simple sum of the individual resistances (RT=R1+R2…), but resistors in parallel did not work that way. The formula for the total resistance was a sum of the reciprocals of all individual resistances (1/RT=1/R1+1/R2…) and was always less than the smallest resistor in the circuit. At first it seemed odd. You introduced more resistors but got less resistance. Working the numbers shows why, but at first blush it really did not make sense.

The parallel resistor lesson was the first of many “it is not what you would think” experiences. In general, putting things together in parallel often leads to non-intuitive results. Parallel computing is full of these situations. Perhaps the most famous is Amdahl’s Law that puts limits on how many parallel processors your can throw at a problem and expect a speed-up. (If you dislike the math associated with such laws, have a look at The Lawnmower Law – a lighter version of Amdahl’s and always topical this time of year.)

Recently I was catching up on some of my favorite blogs and found this statement in Greg Pfister’s Perils Of Parallel page:

Do a job with 2X the parallelism and use 4X less power — if the hardware is designed right.

Finding this extremely interesting, I decided to read the whole blog entry, The Parallel Power Law, a bit more carefully. Specifically, Pfister suggests and provides support for the idea that if you consider a total amount of aggregate computing performance, it is always more power efficient to perform this in parallel with larger number of slower clocked processors than with a small number of faster clocked processors. Again, the total performance being equal. He cites the standard power circuit law:

P  =  CV2f

where: C is the capacitance being driven, f is the clock frequency and is linearly related to the power, and V is the power supply voltage which is related to power as a squared term. Pfister then explains the frequency/voltage relationship in circuits – the faster things happen, the more “oomph” (voltage) you need to move things around. Thus, slow clocks can use a lower V, which can give a significant reduction in power use. In essence, as any over-clocker will tell you, “push voltages up to increase the frequency.”

In these terms, the parallel power law can be described as replacing a processor that runs at frequency f with n processors running at frequency f/n and thus allowing a lower voltage and a quadratic reduction in power. Or as Matt Reilly of SiCortex fame, commented, “Ants waste less food than horses.” Of course, there are other factors that influence the power effeciancy, but in practice many slower processors are more power efficient than a single large processor for a given unit of work. According to Pfister, modern processors do not allow both frequency and voltage adjustments of this type and thus cannot take advantage of this principal.

While this is the first time I ever read about this law, it reminded me of another parallel computing law that Anatoly F. Dedkov and I had found in 1995:

For two given parallel computers with the same cumulative CPU performance index, the one which has slower processors (and a probably correspondingly slower interprocessor communication network) has better performance for I/O-dominant applications (while all other conditions are the same except the number of processors and their speed).

Like the parallel computing power law, the above sounds “non-intuitive.” It reflects the same idea “more slower is better than a few faster.” or “ants chew small bites quicker than horses chew big chunks” You can read the full paper, Performance Considerations for I/O-Dominant Applications on Parallel Computers, if you want to understand the result. (By the way, back then we were working on a large nCUBE system at Rutgers University.)

The non-obvious nature of parallel computing can invite some incorrect assumptions. For instance, combining fast sequential things does not always mean you will create an optimal parallel thing. If scaling, power usage, or I/O are important, then you may be surprised to learn that there are other factors at play than just fast cores. Like my first experience with resistors, “parallel” always seems to introduce some non-obvious results. And, of course, I have not even mentioned about how surprised I was when I learned about capacitors circuits. You can’t make this stuff up.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62