HPC Smackdown: Are Multi-cores really bad for HPC?

Open Discussion: Are the experts to be believed?

Recently there was an article in the IEEE Spectrum stating that Multi-cores are bad for HPC. Maybe. When the first clusters were built, I’m sure it was said that x86 is bad for HPC, but fortunately no one listened. Is this case here? Or, now that HPC is a growing market sector, should vendors be thinking about HPC needs?

Comments on "HPC Smackdown: Are Multi-cores really bad for HPC?"


The IEEE Spectrum article touches on the important issue of system balance. Unfortunately, the causes for poor performance highlighted there – insufficient memory bandwidth and inadequate interconnect – did not start with the prolifiration of multicore processors. They have been with us all along. And would have shown even if we had a single powerful core that has the ‘flop rate’ of our combine multicores. This is simply a consequence of the fact that the compute rate increase marches on faster than advnaces in memory and IO technologies.

That said, there is a new phenomenom that is a direct consequence of multicore. In the past, every two years or so we’d get a new generation of processors that would run about twice as fast. Roughly speaking, we’d compile and run our applications and frequently see a boost in performance that was close to 2x. What’s new now is that this is true only if we can double the number of threads when doubling the number of cores on a socket without much loss of efficiency in the performance of each thread. The real NEW problem we have is how to continue to scale applications by dividing the execution into more and more threads.

Yes, the hardware vendors, forced into multicore processors (due to power and thermal constraints), passed on the challenge of finer and finer grain parallelism to the application designer – unless your application resolution grows sufficiently to allow such scaling without extra effort.

Finally, let us note that during the computer early days, inspite of rapid increase in hardware performance, the innovation in numerical algorithms gave us orders of magnitude greater additional performance than did the hardware. We are entering an era when we need to call again on our algorithm designers colleagues.

David Barkai, Intel


I think this is the wrong question. The question should be How can we (the HPC community) make the best use of the technology that feeds the larger markets? I believe this has always been the case for commodity based HPC. Of course if multi-core is not what you think the market needs, then you can start a company to fabricate the right processor. But, the question you have to answer (and explain to your investors) is the development cost vs. the size of the market and the ability of you to capture a certain share of said market. If you do the math, you will find out why commodity hardware is “good enough.” There are cases for customization in HPC, interconnects are a good example, but processor design and fabrication are expensive undertakings.

Personally, I don’t like multi-core for HPC. I like a single fast processor, local memory, and a fast interconnect. I also realize I am not going to get that anytime soon. The next best thing is to work with what is available at a reasonable price. Indeed, even if we use half the cores on a multi-core processor we are still computing at cost over ten times less than we did in the past. And, as I have been saying all along, the real challenge is software.


“Multicore is bad news for supercomputers.” That’s quite a pronouncement. The issue I see is that processor and system technology is substantially changing, again, and supercomputing needs to adjust the way it writes codes, again. As noted in Murphy’s paper, the last time we had such a major upheaval, vector supercomputers were replaced by commodity microprocessors, clusters and MPPs. Today’s codes, with their tremendous performance and scale relative to those times, didn’t happen by just porting existing code to a new platform.

Recall the early 1990′s, when vectors ruled the supercomputing world. Depending on where you were, it was either quite clear–or at least becoming so–that those days were numbered. In my previous life as an ISV, by the mid 1990′s, I was preparing for what becaome today’s MPI-based distributed codes. The SMP codes we had were clearly not going to work on distributed systems. So, we did the only thing we could: we stepped back, looked at the problem anew, re-architected our codes, and developed distributed implementations to replace our beloved threaded codes. Among the CAE ISVs, production versions of distributed codes started appearing in the late 1990’s. And what do you know? Today’s distributed codes greatly out-perform and out-scale those SMP codes that we knew and loved.

I’m truly not a Pollyanna. I really do understand that transition to distributed parallel took a huge amount of work. I personally remember failed attempts and dark days wondering how we were going to get through it all. Not only did we need to learn a new programming paradigm, but we needed new algorithms that were radically different than what we had. But, at the end of the day, the work resulted in performance levels that would not be achieved if we were still writing threaded code for SMP processors.

Let’s be clear, that was hardly the first such transition requiring new algorithms and new programming styles, scalar to vector and serial to parallel both come to mind. This will also not be the last time we have to re-architect our codes. The status quo is not static. We should be getting good at this by now.


Disclaimer: My comments above are my opinions, not those of my employer, Intel.

red dwarf

It seems like a four way configuration is a barrier to many computer systems. I sat in on a Microsoft conference some time ago that included a chart showing that going from one to 4 CPU’s would have a steady rate of performamnce increase. Once they went beyond four the rate of increase dropped dramatically – roughly 20% per added CPU down from 80%. I have also seen limits on virtual machines or VCPU’s for similar threading reasons.

I think the new GPU programming configurations are going to lead the way making the CPU cores issue a moot point anyway. My curent personal cluster Red Dwarf (the opposite of Big Blue and a great TV show) cost about $1000 using Via mini ATX boards and runs Parallel Knoppix I estimate about 18 Gflops. My next rig will cost about the same and have about 500 GFlops. Power to the people; imagine having a 4 Teraflop machine in your house running on standard power for $10,000 – Check BOXX, CRAY, NVidia. Cities have been rated on available bandwidth for the last 10 years, now I think they will be rated on how many flops of computing power is available for research. I want to be the first to start a personal supercomputer club out here in Omaha Nebraska.

The opinions expressed are my own and not of my employer the Easter bunny.

Red Dwarf – end of line


In 1990 the database Sybase had a scalability of 50% with 4 processors. By 1996 this had risen to 100% with 16 processors. I’m not sure what the current limit is but its many times that now. A combination of software and hardware improvements have seen parallelism improve dramatically. In Sybase’s case and initial redesign was needed and after that it was a mainly a question of identifying bottlenecks and rethinking internal data structures to minimise contention. This sort of activity doesn’t come for free. The increased complexity of code requires improved quality control. The tools for highly scalable code are in their infancy, so its not trivial to develop. However the techniques required are now quite well known and will no doubt become the ‘norm’ as even basic PC’s are now multicore.


In the article: http://www.spectrum.ieee.org/nov08/6912

Appear as a solution: ““The key to solving this bottleneck is tighter, and maybe smarter, integration of memory and processors,” says Peery. For its part, Sandia is exploring the impact of stacking memory chips atop processors to improve memory bandwidth.”

We have today the AMD64 processors with the memory controller integrated on chip, and then, the problem is’nt resolved?

Whats up this is kinda of off topic but I was wanting to know if blogs use WYSIWYG editors or if you have to manually code with HTML. I’m starting a blog soon but have no coding expertise so I wanted to get advice from someone with experience. Any help would be enormously appreciated!

It’s in reality a great and helpful piece of information. I’m happy that you just shared this useful info with us. Please stay us up to date like this. Thanks for sharing.

Thanks , I’ve recently been searching for information about this subject for ages and yours is the best I have found out so far. However, what in regards to the conclusion? Are you positive concerning the supply?

It’s actually a nice and helpful piece of info. I’m satisfied that you just shared this useful information with us. Please stay us informed like this. Thank you for sharing.

1GKCRK fkkujhmutogi, [url=http://qkecdcxzkiar.com/]qkecdcxzkiar[/url], [link=http://peuhentjqhru.com/]peuhentjqhru[/link], http://qoqufvuwftud.com/

I’m still learning from you, while I’m improving myself. I absolutely enjoy reading all that is written on your website.Keep the aarticles coming. I liked it!

The next time I read a blog, I hope that it doesnt disappoint me as much as this one. I mean, I know it was my choice to read, but I actually thought youd have something interesting to say. All I hear is a bunch of whining about something that you could fix if you werent too busy looking for attention.

Leave a Reply