The Cost of Multi-core: Faster is as Faster Does

With all due respect to Forest Gump, defining fast is becoming a bit harder these days. And, yes, it has to do with multicore.

With all due respect to Forest Gump, defining fast is becoming a bit harder these days. And, yes, it has to do with multicore. There certainly is no argument that a faster execution takes less time. For instance, if my program ran in 10 minutes on the old processor and now runs in five minutes on the new processor, I would be pleased. Now let’s see how this plays out in multicore land.

First, let’s assume your code(s) are “multicore” ready as they may be written using threads, OpenMP, or even MPI. Second, you’re in the market for new compute nodes and a decision must be made about the type of processor. Should you buy a larger amount of faster dual-core nodes, or a lesser amount of slightly slower quad-cores? (Note: more cores create more heat, so quad-core processors tend to come in at lower frequencies than dual-cores.) Of course, we know it all depends on the application.

In the end, the application performance is easily measured, so a little benchmarking is in order. Let’s further suppose the the following performance. Running your code on one dual-core, you find you can get a 1.8 times speed up over a single core. If you then run the same code on a quad-core, you find the speed up is 2.2 times faster than a single core. So which is faster? Of course the quad-core is faster, but some would suggest that you should be getting close to a four times the performance since you’re using a quad-core system.

In a sense, the quad-core is faster, but less efficient as you are not achieving full performance. HPC users prefer linear speedup. If one doubles the number of cores, and the program only speeds up by 20% or so, then it seems like your scalability is leveling off. In the case of the dual vs. the quad, one might conclude, the quad is a little faster, but much of the extra cores are wasted because they are not being used, so I’ll stick with the dual-core solution.

Of course, achieving peak performance is the desired goal, but how often is every part of a single core running at its peak rate? Do all your codes use both the floating point unit and do integer calculations at the same time?

It is reasonable to expect some codes can use four cores more effectively than other codes, just as some codes can use a single core more effectively than others. If we forget about the number of cores for a moment, isn’t the processor that runs 2.2 times faster the better of the two? Assuming you may spend about the same amount of money for processors, why should we care if it has four, eight or one hundred cores, as long as our code runs faster than before?

Perhaps we need to stop thinking about cores as though they are “nodes,” or extra processors, and focus on system-wide performance increases.

Fair enough, but there is one other issue that keeps cropping up — software licensing. In the past, most commercial software vendors licensed programs on a per CPU basis. When multicore hit the market, some commercial vendors decided to license per core. This scheme makes sense as multiple cores look like discrete processors to the OS. Some vendors have continued to license software on a per-socket basis. In those cases, a higher core to socket ratio may make more sense. There are also a number of methods used by companies like IBM and Oracle that try and land somewhere in the middle accounting for both cores and sockets. Other than customer confusion, most of these efforts really don’t seem to be addressing the issue. Of course, we have not even brought virtualization into the discussion.

It should also be noted that, either way, proprietary license fees often outweigh the cost of the hardware by far. If you decide to use quad cores, and your commercial application that’s licensed per-core is only showing a three times speed-up, then you have effectively wasted the license fee for the fourth core. In this case, it may better to use dual-cores because even with lower aggregate performance, their core utilization is higher.

It seems expectations of performance may not match the economics of software licensing. At the moment, there does not seem to be any answers to these and other issues surrounding multicore. The best advice for now, just remember, “Multicore is like a box of chocolates. You never know what you’re gonna get.”

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62