The Cost of Multi-core: Faster is as Faster Does

With all due respect to Forest Gump, defining fast is becoming a bit harder these days. And, yes, it has to do with multicore.

With all due respect to Forest Gump, defining fast is becoming a bit harder these days. And, yes, it has to do with multicore. There certainly is no argument that a faster execution takes less time. For instance, if my program ran in 10 minutes on the old processor and now runs in five minutes on the new processor, I would be pleased. Now let’s see how this plays out in multicore land.

First, let’s assume your code(s) are “multicore” ready as they may be written using threads, OpenMP, or even MPI. Second, you’re in the market for new compute nodes and a decision must be made about the type of processor. Should you buy a larger amount of faster dual-core nodes, or a lesser amount of slightly slower quad-cores? (Note: more cores create more heat, so quad-core processors tend to come in at lower frequencies than dual-cores.) Of course, we know it all depends on the application.

In the end, the application performance is easily measured, so a little benchmarking is in order. Let’s further suppose the the following performance. Running your code on one dual-core, you find you can get a 1.8 times speed up over a single core. If you then run the same code on a quad-core, you find the speed up is 2.2 times faster than a single core. So which is faster? Of course the quad-core is faster, but some would suggest that you should be getting close to a four times the performance since you’re using a quad-core system.

In a sense, the quad-core is faster, but less efficient as you are not achieving full performance. HPC users prefer linear speedup. If one doubles the number of cores, and the program only speeds up by 20% or so, then it seems like your scalability is leveling off. In the case of the dual vs. the quad, one might conclude, the quad is a little faster, but much of the extra cores are wasted because they are not being used, so I’ll stick with the dual-core solution.

Of course, achieving peak performance is the desired goal, but how often is every part of a single core running at its peak rate? Do all your codes use both the floating point unit and do integer calculations at the same time?

It is reasonable to expect some codes can use four cores more effectively than other codes, just as some codes can use a single core more effectively than others. If we forget about the number of cores for a moment, isn’t the processor that runs 2.2 times faster the better of the two? Assuming you may spend about the same amount of money for processors, why should we care if it has four, eight or one hundred cores, as long as our code runs faster than before?

Perhaps we need to stop thinking about cores as though they are “nodes,” or extra processors, and focus on system-wide performance increases.

Fair enough, but there is one other issue that keeps cropping up — software licensing. In the past, most commercial software vendors licensed programs on a per CPU basis. When multicore hit the market, some commercial vendors decided to license per core. This scheme makes sense as multiple cores look like discrete processors to the OS. Some vendors have continued to license software on a per-socket basis. In those cases, a higher core to socket ratio may make more sense. There are also a number of methods used by companies like IBM and Oracle that try and land somewhere in the middle accounting for both cores and sockets. Other than customer confusion, most of these efforts really don’t seem to be addressing the issue. Of course, we have not even brought virtualization into the discussion.

It should also be noted that, either way, proprietary license fees often outweigh the cost of the hardware by far. If you decide to use quad cores, and your commercial application that’s licensed per-core is only showing a three times speed-up, then you have effectively wasted the license fee for the fourth core. In this case, it may better to use dual-cores because even with lower aggregate performance, their core utilization is higher.

It seems expectations of performance may not match the economics of software licensing. At the moment, there does not seem to be any answers to these and other issues surrounding multicore. The best advice for now, just remember, “Multicore is like a box of chocolates. You never know what you’re gonna get.”

Comments on "The Cost of Multi-core: Faster is as Faster Does"


I have thought about this as AMD and Intel have put there quad core solutions out. I was under the wrong impression that when go from one to two to 4 cores that you would see a basic 4 fold rise in performance. After discussing this with our HPC folks here at the lab I was corrected. So my question is then how could you get a four fold increase using four cores? Where is the bottle neck?


The simple answer is “it all depends on the application.” The longer answer is contention for resources which in most cases is memory. Both Intel
and AMD solve this in different ways right now,
although Intel is moving toward the method used
by AMD — a fast memory sharing bus called hyperchannel. Think of it this way, before multi-core, processors were much faster than memory (which is why cache memory is used), but now you have more cores hitting the same memory. In some cases codes can run well in others there is a bottleneck, which leads back to my first short answer.

naturally like your web site but you have to take a look at the spelling on several of your posts. A number of them are rife with spelling issues and I to find it very bothersome to tell the truth nevertheless I’ll surely come back again.

Leave a Reply