Processor Bifurcation

The processor market is diverging between two paths, the general and the predictable. Where does HPC hitch it's wagon?

There is a big word in the title. You are either reading this because, you don’t know what it means and want to find out or you know what it means, but have no idea what it has to do with microprocessors. Either way, I’m hoping you are hooked because I think we are witnessing a second dramatic change in the computer market (the first being multi-core). As many of you know, I have had many a sleepless night when it comes to multi-core. In the fits of my obsession, however, I may have missed something.

The commodity microprocessor (i.e. x86) has been the juggernaut of the computer industry. For HPC clusters the turning point came when the Intel Pentium Pro was released. It was fast enough for HPC (thanks to a much improved Floating Point Unit), it worked well in dual processor servers, and it made for great workstations and desktops. All the same part at the same price. Then the marketing department got involved. Why charge the same for a “server” processor as a “desktop” processor when servers cost much more than desktops. Thus, began the Xeon age and with it the server, workstation, and desktop markets were created by adding/deleting features from a baseline processor design. Great marketing — however, the processors were basically the same. In a way, it reminds of the old Sunoco gas pumps where you could mix your own octane grade. The price of course increased with the octane. Now we have regular (desktop), midrange (workstation), and high test (server). In case you don’t read Consumer Reports, it is basically all the same gasoline.

I currently have a small four node (single socket) Core 2 Duo cluster that can achieve 47 GFLOPS on the Top500 HPL benchmark. I built the cluster in 2007 for a cost of $2300 (That is less than $50 per GFLOP). How is this possible? Inside the nodes, the processors are basically the same as those used in the servers that crank out the big Top500 ratings. Of course, marketing lowered the octane on my processors, but they still come from the same refinery as it were. The better grade fuel may help you go faster, but not that much faster.

While I’m using automotive analogies, let’s introduce the megahertz wall. You know, the big cement wall in the middle of the road that all the fast and hot single core CPU’s crashed into. Yea that one. And, thus the multi-core era began. This is where things get interesting. In the server market, multi-core is actually a good thing. Much of what a server does is stateless or at least independent, so more cores means more throughput. Throw in virtualization and the whole multi-core thing does not seem so bad. Moving on down to workstations (and in a sense HPC clusters), multi-core is again kind-of welcome. More cores, more work gets done, but there is the programming issue. I have talked at length in the past about this issue and will continue to do so in the future.

Let’s take a look at the desktop market. Multi-core really does not make much sense here. Spreadsheets, word processors, and browsers run pretty well on slow processors. Of course there is multimedia. Now there is a core-sucking application area if I ever saw one. However, any multimedia/gaming fanboy/fangirl worth their bits has a high end video card specifically designed to do the grunt work. The cores could do this kind of processing, but the results would be slower as cores are designed for general purpose computing.

At this point, a small sidebar may be helpful. If we look at how computers are used, one could make a general distinction between two types of computing: predictable and general/non-predictable. Some examples will help. If I have to multiply two large matrices together that is a highly predictable operation. If I have to service database requests, that is very non-predictable operation. A general purpose processor can be used for all types of computing, but it must dedicate silicon to handle all kinds of unpredictablness. A GPU (Graphics Processing Unit) on the other hand, is designed to perform predictable computing. The types of operations are similar, very repetitive, and parallel. If we were to look at
the desktop/laptop PC we find a a mix of uses. A word processor is non-predictable as is running a desktop windowing system. On the other hand, multimedia presents a very predictable and parallel type of computation. Moving to the high end, a web/database server is obviously non-predictable. When a multi-core server is used in a cluster, it is doing highly predictable computing. With that in mind, let’s take a hard look at where things might be headed in the future.

As mentioned, most desktop/laptop systems do a mix of computing. However, it is safe to say that most desktop applications do not need all the horsepower available in the current multi-core offering. Dumping in more cores is probably not going to make that word processor much faster. Dumping in more GPU hardware will help when it comes to multimedia and that seems to be where desktops are headed. Because the GPU hardware is specifically designed for predicable computing, it will work much better than extra cores. What may serve this market better is an “average” processor coupled with massively parallel Predictable/Parallel Computing Units or PCU for short (i.e. another way to say GP-GPU). This design is similar to the IBM/Sony/Toshiba Cell Processor. Indeed, if NVidia were to drop general computing core into one of their Tesla products, I would not be surprised. Considering that AMD has FireStream and Intel has a powerpoint version of Larrabee, there may actually be something going on here. It should be mentioned that the world’s fastest computer (as measured by the HPL benchmark), RoadRunner at Los Alamos National Lab, uses Cell processors.

The fork, or bifurcation, in processors may soon be upon us. The traditional multi-core for non-predictable general computing vs. the single core plus PCU unit for predictable computing may be the way we look at things in the future. In a year, there may very well be two paths in commodity processor development. The first will be the growth of multi-core (more cores per die) and the second the growth of PCU processors. Unlike, my gasoline analogy, these designs will be very different and have very different performance profiles.

If my prognostications are true, there is one big question on the horizon. What is best for HPC and clusters? The startling performance of CUDA enabled applications from NVidia has people wondering. I would guess this is just the first of many such successes (although good luck trying to pull that PS3 out of your kids hands so you can play with a Cell processor). HPC is predictable computing and the desktop/game console market may turn out to be a bigger driver of HPC than the server market. Interesting times lie ahead. Maybe my anxiety about multi-core has been misplaced. Perhaps, everything will be alright, but then how will we program large numbers of PCU’s? In the end, it always seems to be about the software.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/ on line 62