Why Parallel Matters

As the move to multi-core accelerates so does the need for parallel programming, but first let's look at what is causing this trend.

There is a general rule in computing that says performance and capacity move down the market pyramid as time progresses. We all know that our phones (even the dumb ones) are much more powerful than the first computers. Indeed, many new smart phones will have two compute cores. Technically speaking, the NVidia Tegra 2 has 8 processing units, two of which are general purpose ARM Cortex A9 cores, the others are for audio, video, etc.

We have all heard the line “that cell phone in your pocket has more computing power than the Apollo space craft”, or similar. Not only has performance increase traveled down the pyramid, but so has capacity. I have a little TeraByte (Raid 1) NAS for my home network. And, for about $300 and (minimal effort) I could double the size. My cell phone, has an 8GB memory card. These current pedestrian amounts of processing power and capacity used to be at the top of the computing pyramid.

HPC has taken advantage of this trend as well. My first cluster had four nodes with dual processor nodes connected by Fast Ethernet (Note, not dual core nodes, but dual Pentium II 90 degree slot-sockets). Those 8 “cores” can now be had in a single low-cost server, which of course runs much faster, with more memory and more storage.

Another indication of the growth of computing power are man-machine challenges. At this point, it is pretty much a given that computers can beat humans at chess. The average multi-core desktop running a good chess program can trounce most average to very good players. I am assuming that more cores means more outcomes can be computed in a given time, and thus a better chess program. By the time you read this, the Watson Jeopardy showdown will set another computer milestone. (My prediction is at the end of this column.) Playing Jeopardy is much harder than playing chess and the Watson hardware is formidable, yet “off the shelf” (2880 POWER7 cores, 16 TeraBytes of memory, and 4 TeraBytes of clustered storage all in ten racks running Linux). Also note, Watson stands alone, “he” is not connected to the Internet.

From a hardware standpoint, Watson is pretty impressive, just as the Chess Master Deep Blue was back in the day (late 1990′s). In the future, I’m sure we’ll see Watson capability (or similar cyber players) on your cell phone and pad devices. “Those ten racks of hardware in your pocket, you have got to be kidding”. I’m not. I did not say how far in the future, but sooner than you think, and don’t discount the network.

Fundamentally, parallel computing will be at the core of all these new pocket and pad wonders (no pun intended). If you program these devices, you will be creating parallel applications — just like we do in HPC. It may be threads or messages, but regardless, you will need to think in parallel.

Just in case you don’t believe me, I have prepared a little back of the envelope diagram that will help demonstrate my point. First, let me say, it is really on the back of an envelope (mostly so I would not have to create a nice looking figure). And, second it a not meant to be an exact diagram, but rather an illustration. On the left axis we have the Average Watts for all Processors. This number is the average watt rating for all processors sold at a given time. The past trend has been a steady rise in this number as the speed (clock rates) of processors increased. A few years back it stop rising because things were just getting too hot. At that point, the processor design went sideways and started growing cores. More cores were possible because of the continuation of “Moore’s Law,” which is represented on the right axis and the strait line with X’s through it. My prediction that due to the demand for pads and phones, the Average Watts for all Processors is going to begin to drop. That is, the market will create a big need for low power processors. At the same time, the market will also want more capability from their “devices.” Thus, the only way to increase performance is to add cores. (i.e. As transistors continue get smaller, they require less power, we can pack more cores in the same space.)

Relationship between watts and core count

Relationship between watts and core count

My diagram illustrates the continued growth of transistor density when combined with the need for low power may accelerate the growth of multi-core processors. Recall that multi-core resulted when we hit the “frequency/power limit,” but now the average frequency/power limit is getting lower (as shown in the figure). This trend will result in additional multi-core pressure. I believe at some point in the not too distant future we will enter the many-core era. Indeed, one could argue that the AMD Fusion APU is a perfect example of this trend. Unless there is a breakthrough in battery technology or a way to hold 25+ watts in your hand, I don’t see any other options.

If you write software, you should be paying close attention to this trend. The future is parallel. As I have been opining all along, if you are in the HPC club, you already understanding parallel. The club membership is going to grow. In my next column, I’ll take a look at some of the current hardware and consider some software options or lack thereof. And, at some point, I circle around to explain how this may actually help HPC.

By the way, my prediction for the man-machine Jeopardy showdown: Watson 1, meatbags 0, turned out to be true.

Comments on "Why Parallel Matters"


Doug, Nice article. I agree that in future our personal laptop would have multi-cores in our laptop (,not surprisingly in our smartphones too). Not sure how much of s/w development is done w/ OpenMP, etc to maximize the use of all those cores. I feel with more cores the industry is really moving towards doing more of Virtualization. I think the push is even to do Virtualization on Smartphones/tablets.


I enjoyed the article as well.

Don’t forget about threading as well. Though seemingly limited at the moment on x86 processors, could be utilized as well. Threading is a method that could conserves memory consumption.



Outstanding. It’s about time somebody collected power/core together into a simple, accessible article. Parallelism is the way forward. Welcome back SMP, we missed you…


I predict at some point, my phone will replace my computer at home as the main computing and storage device in the household. At that point, I will only need a monitor, mouse, and keyboard to plug in to a docking station for my phone.


WRT SMP, what about bandwidth problems? That is lack of sufficient bandwidth for all the cores to share the common memory. This was discussed in the Scaling Bandwidth column, June 24th, 2009.


Buggsy2, this is why Intel’s SCC platform is so interesting. DRAM is subdivided by using multiple memory controllers. Additionally, on-chip photonics technology is posed to provide an optical bus that should increase bus frequencies into the THz range without increasing power or thermal envelope.


I like using Java Threads, it gives more control to the programmer for the memory-shared parallelization than OpenMP.


Scaling memory bandwidth has always been a challenge, it might be always be. Even if that gets fixed we’ll run into some other restriction. We always do that to, and that’s the best thing that can happen. The market will drive the hardware and software like it always has. We rode gaming to high clock rates and GPU computing. AMD made 64-bit a commodity. Intel’s SCC is a great step forward. It’s a great time to be in the HPC industry. We’ve got more GHz and more GB today than we ever had, and it’s only going up. It’s awesome.


The drop you draw hasn’t happened for servers. Instead, it just flattens out. I’m not sure why it wouldn’t just flatten out for smaller devices, too, as long as battery life is adequate.

But I agree, more cores each running slower (important) at lower voltage (important!) can give the same performance as a fast core, and do so at lower power — assuming parallelized software, very good parallel scaling, etc. It’s even a square law. That’s the point I was making in my blog posts about the parallel power law (http://bit.ly/hgfAO9, http://bit.ly/fcpB6r).


A “graphic” demo of serial vs. parallel work:


Thanks for giving us objective information, much appreciate this!
Annet Mark

My web blog mathys.vanabbe.com (Manuel)

Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.

Keep on working, great job!

gFHcSd hcqtobizcmuu, [url=http://dqgadyvihyug.com/]dqgadyvihyug[/url], [link=http://edrkaesserrn.com/]edrkaesserrn[/link], http://igzwvolycado.com/

Dobry artyku?. Super stronka.

Leave a Reply