Once the hallmark of the data center, HPC hardware is beginning to find its way to the desk side/top. Multi-core, efficient design, and even application scalability have combined to clear the way for personal HPC.
Scaling Up Users
When ever there is a casual conversation about the Top500 list, I always do a little survey. It goes something like this, “I see the Top500 system this year used 129,600 cores. About how many do you use for your typical application?” The answer is always interesting. First, I like to bring people back to reality by allowing them to see the fastest machines in the world (As measured by the Top500 benchmark) are designed for big problems that are of interest to a small number of people. It seems the more pedestrian problems do not use that many cores. I often finish the conversation with “So why all the interest in the Top500 if it really has no relevance to your problem.” And, please don’t get me wrong, I like the Top500 list. It provides valuable statistics on HPC systems and of course bragging rites.
I do know people who use lots of cores, so it does not surprise me to hear 512, 1024 or some other rather large power of two as an answer to my question. More often however, I hear numbers like 8 or 16. I rarely hear core counts like 48, 64 or a number between 32 and 128. A curious, but not unnoticed trend. Take a look at the Blue Collar Computing initiative at the Ohio Supercomputer Center. They are addressing the need for more scalable codes in the middle of the processor continuum. The trend is clear however, right now low core counts are the norm.
Another data is available as well. Recently, IDC (Worldwide HPC Market Update and New Research Directions IDC Briefing at SC08, Austin TX) released scaling limitation of 78 ISV (Independent Software Vendors) applications. Those using one core totaled 24% with those in the range or 2-8 accounting for 32%. The next increment from 9-32 was 26%. All remaining core counts, including unlimited, were less than 20% of all applications. Again, low core counts, particularly those less than 32 (82%) constitute most of the ISV market.
Recently, I ran a small HPC survey (106 respondents) that asked a similar question “What is the range of cores you use for MPI jobs?” The results are are in Figure One below. Note that almost 50% of the respondents use less than 16 cores and the “blue collar hole” from 32 to 128 cores. Keep in mind, the survey was informal and these results may include ISV codes, user written codes, or openly available codes.
Figure One: Maximum Core Usage (106 respondents)
These results re-enforced the Blue Collar Computing findings as well. The current need for 61% the users surveyed is for 32 cores or less. A multi-core node with 16 cores may be all some users need. There are issues of memory bandwidth to be considered as well. The point is that the multi-core scale out can now provide users enough compute power with a handful of motherboards. This situation is far cry from the full rack solution required just five years ago.
There may also the need to run multiple jobs by the same or different users. This need will push for higher core counts, but not in a scalable fashion. i.e. If no one uses more than 16 cores, then several independent 16 core systems would probably work just as well as a highly connected (and more expensive) 64 core solution.
Is it a Cluster?
The other effect multi-core has on departmental computing is in the design of the system. Instead of cluster, small systems may actually what are called Constellations. According to Beowulf pioneer Tom Sterling “A constellation is a cluster of large SMP nodes scaled such that the number of processors per node is greater than the number of node.” This definition easily fits many small “clusters” today. Consider a 32 core system composed of four motherboards each with eight cores — a constellation not a cluster. Using this taxonomy, constellation becomes a cluster when the number of nodes equals or exceeds the number of cores per node. In this case, you would need 8 nodes (for a total of 64 cores) to be considered a true cluster. While the distinction may seem somewhat arbitrary, from a software standpoint, such a design may require some attention to application design.
The Software Issue
One cannot in all good conscience leave this topic without addressing the software issue. Parallel codes, usually those written in MPI, can run on multi-core servers. They can also run across several multi-core servers. As mentioned above, however, many small departmental systems are actually constellations and may merit a different approach to software.
The biggest issue facing the programmer is do I write my code for a cluster, a constellation, or an SMP (a multi-core motherboard)? The answer to this question depends of course on the application and the maximum scalability you hope to achieve. If Amdahl’s law limits your scalability and you know it will be very difficulty to get more than an 8X performance increase, then perhaps targeting and SMP design with something like OpenMP is the way to go. If you need more cores, you may want to look at MPI. Or, you may want to use a hybrid approach that employs OpenMP on the SMP motherboards and MPI to send messages between motherboards. In any case, important decisions will needed to be made as part of the software design process. Unfortunately, there is no single programming language for all situations.
A final software issue may be the choice of operating system. While the high end of HPC is dominated by Linux, the low end (personal or departmental) has one other option — Microsoft Windows HPC Server 2008. Almost all the systems mentioned will support a Microsoft HPC solution. As HPC moves closer to the desktop, which is still dominated by Windows, users may find that a Windows based HPC solution integrates better with their current environment. Linux is always an option as well and interoperability seems to be mantra of desktop HPC these days.
Close to Your Desk
Up until this point, we have talked about the emergence of departmental HPC where the advent of power efficient bladed systems has allowed cycles to migrate away from the data center and into the department. The need for such a move may generate some controversy. Where the resource lives, however, is not all that relevant. If space and power are available, placing a small departmental blade based cluster in the server room may make the most sense as the hardware is still under the control of a department or individual. If as is the case with many data centers, there is a shortage of resources, local placement of a small HPC resource may make the most sense.
There is also the issue of how many users and how many cores. Clearly, this requirement may push a large HPC resource back into the data center, but based on application scalability this may not be an issue for many users and a local resource may work just fine. There is also the prospect that a single 8 core workstation may address the computation needs of an individual or small office. Again, it all depends on the resource loads and needs. Perhaps the only sure thing is the ever increasing core density from the CPU vendors.
In closing, the scale out of cores has created some very attractive options for “local HPC”. Indeed, application scalability, efficiently designed blades, and small foot prints have all combined with multi-core to provide some very interesting options for the HPC user. That would be you and your cluster er-ah, constellation.