Once the hallmark of the data center, HPC hardware is beginning to find its way to the desk side/top. Multi-core, efficient design, and even application scalability have combined to clear the way for personal HPC.
When the first specifications for dual-cores were released I recall talking with one of the major CPU vendors about the impact on HPC. My interest was more about the scale out (more cores per motherboard) prospects rather than the scale up (more nodes in a cluster) possibilities. My assumption was that dual, or even quad, socket motherboards would put 4 or 8 cores in one server or node. And, when the inevitable quad-cores would hit the market this would bump the numbers to 8 or 16 cores per node. Back in the day, an 8 or 16 node x86 cluster (single core CPU in dual socket motherboard) could easily take up to 32U of rack space using 2U servers (remember this was “back in the day”). Throw in a switch, a KVM, maybe a UPS or a monitor drawer and you had a full rack.
That was then, multi-core is now. One of the often missed aspects of the multi-core revolution is the ability of HPC cluster applications to migrate from the data center to the department server closet or even to the desk-side. There are several factors at work here. The first and obvious one being core density — more cores per server. The second and less obvious is power and cooling. The third factor is perhaps the biggest surprise — most people do not need that many cores. We will take a look at these issues and some other issues as we examine the growth of departmental HPC systems.
The Push From Above
Cores, more cores. The quad-core processor has now become the standard choice for HPC. Both Intel and AMD have product offerings featuring fast and efficient quad-cores. Indeed, the much anticipated Intel I7 (Nehalem) has begun to ship and is expected to raise the bar in terms of quad-core performance. Doing what I call socket math this can put up to 16 cores on a motherboard. Octa-core designs are also being developed and may arrive in the 2010 time frame. Cores it seems are plentiful.
Another core push due to stream or GPU processing (i.e. General Purpose – Graphics Processing Units or GP-GPU). For many problems this type of hardware results in large speed-ups over a conventional general purpose processor. Major PC graphics players, NVidia and AMD/ATI, have products designed specifically for HPC use. These system apply large number of slower cores (e.g. 800 cores or more) to a single problem. Like the core heavy 1U server, these systems require a minimal amount of space in a standard rack (as little as 1U). They can also be placed in desk top/side unit as well.
Regardless where they live, it is now possible put a large amount of processing power into a small amount of space. The need for an HPC system to live in a rack in the data center may not be as important as in the past.
Real Power, Real Close
To their credit, Intel and AMD have managed to deliver multi-cores without requiring an associated increase in power. (i.e. a current quad-core chip requires the same amount of power as a dual-core chip, it may run a little slower, however). This result has allowed the core densities in both rack mount and blade servers to increase dramatically.
For instance, consider the IBM BladeCenter S system. It is a server closet (rack mountable), but because it is designed for power efficiency could easily sit on or next to a desk. It is powered by standard 110-220 volt electrical service and can hold up to six blade servers (up to 48 cores total). The system even has integrated storage built into the chassis (12TB SATA and 12TB NL SAS are available) and does not require chilled air. It also has a Office Enablement Kit that allows for mobility (wheels) and protection from the office environment (dust, dirt, coffee!). Another nice feature of the Blade Center S is the low acoustic signature — a must for office use. Similar systems with various feature sets can be had from HP, Dell, and Supermicro.
Another entry in the departmental arena is the Cray CX1™ “Supercomputer.” The CX1 can be thought of as a blade systems on wheels. Mobility is nice with this size unit. (Which should be taken as a hint to the other blade vendors). It can support up to 64 Intel cores and can be configured with InfiniBand® as well. Similar to other blade systems, it can use storage blades and works with standard office electrical service.
Be careful about blade server assumptions. There are some who may believe that blade based were designed for “office use” and lack features needed for true HPC performance. This conclusion is hardly the case. For example the BladeCenter S QS22 blade provides two IBM PowerXCell 8i processors and dual-port DDR InfiniBand connections. (The QS22 blade can provide up to 6.4 TFLOPS single precision and up to 3.0 TFLOPS double precision performance) Most blades also provide a InfiniBand options as well. The CX1 has a NVidia Telsa GP-GPU blade for stream computing applications.