A snapshot of the HPC market from IDC and the dreaded "wall" is here.
I almost spit out my coffee. He said it. I am almost positive. I better ask him after the talk just to be sure. Is this the news I have dreaded for so long? More coffee might help, maybe something stronger, seems I picked the wrong week to stop sniffing … , well that is another story.
Patiently I waited. There were more slides and numbers presented by IDC in their annual Supercomputing (SC) breakfast meeting. I have enjoy these early morning presentations each year at SC. In exchange for your time and contact information, IDC gives a presentation on the HPC sector complete with some of that coveted marketing data. The slides are sent to attendees after the meeting so you can study the “free marketing data” as you choose. A nice gesture by IDC.
This year IDC reported that the HPC sector grew 19% over last 5 years. IDC was forecasting 9.25% growth for the next five years, but they want to rework some of those numbers in light of recent economic events. Blades are making inroads into all segments while power, cooling, real-estate, and system management were identified as big challenges for data centers. In addition, storage and data management continue to grow in importance. Finally, they mention that software hurdles will rise to the top for most users due to multi-core and hybrid systems.
The new projections are expected by mid-December. I think it is important to remember, even if HPC spending is reduced by half, a 4-5% growth rate in a recessionary economy is nothing to complain about. Indeed, there are some big unknowns. Consider the automotive industry which represents a large part of the CAE (Computer Aided Engineering) segment. Right now, HPC expenditures in this area are expected to plummet, however, if there is fashionable bailout then the Government may require “greener” cars which is going to boost the need for CAE and HPC.
With all this “kind-of” good news, none of which was totally unexpected, you may be wondering what the heck caused me to almost waste some perfectly good coffee. It was toward the end of the presentation when I heard it. IDC’s lead HPC prognosticator, Earl Joseph made a statement similar to the following:
Users are reporting that they have “hit the wall” with multi-core performance.
Hopefully, you are not drinking any coffee at this point. My multi-core angst is well known and I knew this day was coming. Afterwords, I talked with Joseph about his observation. I asked Do you mean users are running codes on multi-core systems and seeing less performance? He replied, Yes. The answer I did not want to hear, but the one I was expecting.
If you are having problems understanding the situation, let me put you in the users seat for a moment. The following is a hypothetical example, but may be all to real for many HPC practitioners. With my old cluster, I ran an MPI program on a dual socket single-core systems (that is two cores per node). Everything thing worked pretty well with sixteen processors. Depending on how it was configured, the scheduler placed my jobs on eight nodes (using two processors per node) or on sixteen nodes (using one processor per node). In any case, I was pretty happy. The new cluster arrives. It has two dual-core processors on the nodes. Each node has four cores in total. Again, depending on how the scheduler is configured, I might run on anywhere from four nodes to sixteen nodes. But now, my code is running slower (or no faster than my old cluster). What happened?
Of course it all depends on your application, but in general here is what seems to be happening. The clock speed of the processors stopped increasing. To compensate, more cores were added to the processors. More cores means more sharing of the memory and interconnect. In some cases, more sharing means a bottleneck and hence less performance. Curiously, as the number of cores increased, so did the number of FLOPS in your cluster. Therefore, the new cluster is faster, but your code got slower. Of course, that would be FLOPS as measured by a highly optimized benchmark that is not as sensitive to the memory bandwidth or interconnect contention as your code. It is not supposed to work like that!
What is a user to do? Optimize the software for multi-core is the new battle cry. Except, optimization of parallel codes is to be polite, not easy. And besides, you are more interested in the program results than you are with tweaking the code. The second best solution, double the core count to pull that last 5-10% out of your cluster before Amdahl tells you enough is enough.
Such is the state of affairs in the cluster computing world. Software hurdles abound. At the same time hybrid approaches such as NVidia CUDA, AMD Brook+ and even OpenCL all offer yet another software destiny for your codes. My multi-core dread has been realized and the only thing I know for certain is future x86 CPUs will be getting more cores. I’ll stop here for this week. Now where were are those roses. I could use another sniff.