Hybrid Clusters – The Intel, Nvidia Chips in HPC Clusters
The use of hybrid cluster designs that combine standard CPUs with GPUs, accelerators and FPGAs is on the rise.
IDC Analyst firm says that HPC standards-based cluster architectures have propelled the HPC market’s skyrocketing growth in recent years and drove HPC revenue past $9 billion in 2008. HPC buyers are always looking for ways to increase the performance on their codes in a cost-effective way, so the use of hybrid designs that combine standard CPUs with GPUs, accelerators and FPGAs are being explored to increase performance on key codes.
The Tesla GPUs and their related CUDA tools went commercial last fall and they’re been picked up by a number of players, and several other vendors selling what they call personal supercomputers. The Appro HyperPower clusters is not a personal supercomputer but is targeted for small to medium sized HPC deployments. It is based on a 1U chassis with two so-called “twin servers” inside. These are two-socket, half-width servers that support Intel’s current “Nehalem EP” Xeon 5500 processors. Each Nehalem server is linked to a server appliance that has a single Tesla S1070 GPU in it. The Tesla S1070 packs four GPUs – each with 240 cores running at between 1.3 GHz and 1.44 GHz – into a single server chassis that also has 16 GB of its own memory.
This appliance links to the servers through two PCI-Express 2.0 x16 slots (one to each two-way half server). Depending on the clock speed, the Tesla S1070 appliance, which eats up 1U of rack space as well, delivers from 3.73 to 4.14 teraflops of floating point performance with single precision, but only between 311 and 345 gigaflops with double precision. (You can see that that future Tesla cards have to do a better job on the double-precision front).
The Appro HyperPower puts 19 of the twin Nehalem EP servers interleaved in a standard 42U rack with 19 of the Tesla appliances, which yields 304 x64 cores and 18,240 GPU cores. The peak performance of such a rack weighs in at just over 78 teraflops on single-precision codes and 6.56 teraflops at double-precision math. Here’s the scary bit – and it’s not surprising: One of these Tesla appliances burns at 800 watts when it is working hard but it is a scalable computing architecture that can execute thousands of concurrent throughput parallel processing threads for mathematically intensive problems. By using fewer systems than standard CPU-only clusters, this cluster delivers more computing power in an ultra dense architecture at a lower cost. The cluster includes interconnect switches for node-to-node communication, a master node and clustering software in a 42U rack configuration. IT managers also can use the Nvidia CUDA toolkit, which enables users to take advantage of the massively parallel architecture. Customers also get a choice of configurations and open-source cluster management software. Appro is supporting Red Hat’s Enterprise Linux 5 Update 2 and Update 3 on the HyperPower clusters and will eventually support Novell’s SUSE Linux Enterprise Server 10 and 11 for its European customers.