As the move to multi-core accelerates so does the need for parallel programming, but first let's look at what is causing this trend.
There is a general rule in computing that says performance and capacity move down the market pyramid as time progresses. We all know that our phones (even the dumb ones) are much more powerful than the first computers. Indeed, many new smart phones will have two compute cores. Technically speaking, the NVidia Tegra 2 has 8 processing units, two of which are general purpose ARM Cortex A9 cores, the others are for audio, video, etc.
We have all heard the line “that cell phone in your pocket has more computing power than the Apollo space craft”, or similar. Not only has performance increase traveled down the pyramid, but so has capacity. I have a little TeraByte (Raid 1) NAS for my home network. And, for about $300 and (minimal effort) I could double the size. My cell phone, has an 8GB memory card. These current pedestrian amounts of processing power and capacity used to be at the top of the computing pyramid.
HPC has taken advantage of this trend as well. My first cluster had four nodes with dual processor nodes connected by Fast Ethernet (Note, not dual core nodes, but dual Pentium II 90 degree slot-sockets). Those 8 “cores” can now be had in a single low-cost server, which of course runs much faster, with more memory and more storage.
Another indication of the growth of computing power are man-machine challenges. At this point, it is pretty much a given that computers can beat humans at chess. The average multi-core desktop running a good chess program can trounce most average to very good players. I am assuming that more cores means more outcomes can be computed in a given time, and thus a better chess program. By the time you read this, the Watson Jeopardy showdown will set another computer milestone. (My prediction is at the end of this column.) Playing Jeopardy is much harder than playing chess and the Watson hardware is formidable, yet “off the shelf” (2880 POWER7 cores, 16 TeraBytes of memory, and 4 TeraBytes of clustered storage all in ten racks running Linux). Also note, Watson stands alone, “he” is not connected to the Internet.
From a hardware standpoint, Watson is pretty impressive, just as the Chess Master Deep Blue was back in the day (late 1990′s). In the future, I’m sure we’ll see Watson capability (or similar cyber players) on your cell phone and pad devices. “Those ten racks of hardware in your pocket, you have got to be kidding”. I’m not. I did not say how far in the future, but sooner than you think, and don’t discount the network.
Fundamentally, parallel computing will be at the core of all these new pocket and pad wonders (no pun intended). If you program these devices, you will be creating parallel applications — just like we do in HPC. It may be threads or messages, but regardless, you will need to think in parallel.
Just in case you don’t believe me, I have prepared a little back of the envelope diagram that will help demonstrate my point. First, let me say, it is really on the back of an envelope (mostly so I would not have to create a nice looking figure). And, second it a not meant to be an exact diagram, but rather an illustration. On the left axis we have the Average Watts for all Processors. This number is the average watt rating for all processors sold at a given time. The past trend has been a steady rise in this number as the speed (clock rates) of processors increased. A few years back it stop rising because things were just getting too hot. At that point, the processor design went sideways and started growing cores. More cores were possible because of the continuation of “Moore’s Law,” which is represented on the right axis and the strait line with X’s through it. My prediction that due to the demand for pads and phones, the Average Watts for all Processors is going to begin to drop. That is, the market will create a big need for low power processors. At the same time, the market will also want more capability from their “devices.” Thus, the only way to increase performance is to add cores. (i.e. As transistors continue get smaller, they require less power, we can pack more cores in the same space.)
Relationship between watts and core count
My diagram illustrates the continued growth of transistor density when combined with the need for low power may accelerate the growth of multi-core processors. Recall that multi-core resulted when we hit the “frequency/power limit,” but now the average frequency/power limit is getting lower (as shown in the figure). This trend will result in additional multi-core pressure. I believe at some point in the not too distant future we will enter the many-core era. Indeed, one could argue that the AMD Fusion APU is a perfect example of this trend. Unless there is a breakthrough in battery technology or a way to hold 25+ watts in your hand, I don’t see any other options.
If you write software, you should be paying close attention to this trend. The future is parallel. As I have been opining all along, if you are in the HPC club, you already understanding parallel. The club membership is going to grow. In my next column, I’ll take a look at some of the current hardware and consider some software options or lack thereof. And, at some point, I circle around to explain how this may actually help HPC.
By the way, my prediction for the man-machine Jeopardy showdown: Watson 1, meatbags 0, turned out to be true.