Until this point we’ve talked about how the barriers to 10GE adoption have been overcome for the many HPC clusters that use Gigabit Ethernet. Now let’s look at the possibility of bringing the benefits of 10GE to much larger clusters with more demanding requirements. Those implementations require an interconnect that provides sufficient application performance, and a system environment that can support the rigorous hardware challenges of multiple processors such as heat dissipation and cost-effective power use.
Examining the performance question reveals that some HPC applications that are loosely coupled or don’t have an excessive demand for low latency can run perfectly well over 10GE. Many TCP/IP-based applications fall into this category, and many more can be supported by adapters that offload TCP/IP processing. In fact, some TCP/IP applications actually run faster and with lower latency over 10GE than over InfiniBand.
For more performance-hungry and latency-sensitive applications, the performance potential of 10GE is comparable to current developments in InfiniBand technology. InfiniBand vendors are starting to ship 40 Gig InfiniBand (QDR), but let’s look at what that really delivers. Since all InfiniBand uses 8b/10b encoding, take 20 percent off the advertised bandwidth right away and 40 Gig InfiniBand is really 32 Gig, and 20 Gig InfiniBand is really only capable of 16 Gig speeds. But the real limitation is the PCIe bus inside the server-typically capable of only 13 Gigs for most servers shipped in 2008. Newer servers may use “PCIe Gen 2″ to get to 26 Gigs, but soon we will begin to see 40 Gigabit Ethernet NICs on faster internal buses, and then the volumes will increase and the prices will drop. We’ve seen this movie before – niche technologies are overtaken by the momentum and mass vendor adoption of Ethernet.
In addition, just as Fast Ethernet switches have Gigabit uplinks, and Gigabit switches have 10 GE uplinks, it won’t be long before 10 Gigabit switches have 40 and 100 Gigabit links to upstream switches and routers. And you won’t need a complex and performance-limiting gateway to connect to resources across the LAN or the wide area network. At some point, 10, 40, and 100 Gigabit Ethernet will be the right choice for even the largest clusters.
What’s Important: Application Performance
One Reuters Market Data System (RMDS) benchmark that compared InfiniBand with a BLADE Network Systems 10GE solution showed that 10GE outperformed InfiniBand, with significantly higher updates per second and 31 percent lower latency (see Figure One and Figure Two). These numbers demonstrate the practical benefits of 10GE far more conclusively than the micro-benchmarks of the individual components.
Figure 1: 10GE has lower latency than InfiniBand in RMDS Benchmark Source: STAC Research
Figure 2: 10GE is faster than InfiniBand in RMDS Benchmark Source: STAC Research
Servers can come in many sizes and shapes, and new, more efficient form factors are emerging. Blade servers can be used to create an efficient and powerful solution suitable for clusters of any size, with the switching and first level of interconnection entirely within the blade server chassis. Connecting server blades internally at either 1 or 10 Gigabits greatly reduces cabling requirements and generates corresponding improvements in reliability, cost, and power. Since blade servers appeared on the scene a few years ago, they have been used to create some of the world’s biggest clusters. Blade servers are also frequently used to create compact departmental clusters, often dedicated to performing a single critical application.
One solution designed specifically to support the power and cooling requirements for large clusters is the IBM(r) System X™ iDataPlex™. (See Doug Meets the iDataPlex) This new system design is based on industry-standard components that support open source software such as Linux®. IBM developed this system to extend its proven modular and cluster systems product portfolio for the HPC and Web 2.0 community.
The system is designed specifically for power-dense computing applications where cooling is critical. An iDataPlex rack has the same footprint as a standard rack, but has much higher cooling efficiency because of its reduced fan air depth. An optional liquid cooled wall on the back of the system eliminates the need for special air conditioning. 10GE switches from BLADE Network Technologies match the iDataPlex specialized airflow, which in turn matches data centers’ hot and cold aisles and creates an integrated solution that can support very large clusters.
Blade servers and scale-out solutions like iDataPlex are just two of the emerging trends in data center switching that can make cluster architectures more efficient.
A Clear Path
The last hurdles to 10GE for HPC have been cleared:
NIC technology is stable and prices are continuing to drop while latency and throughput continue to improve, thanks to improved silicon and LAN-on-Motherboard (LOM) technology.
10GE switches are now cost-effective at under $500 per port.
The combination of SFP+ Direct Attach cabling, SFP+ optics, and 10GBASE-CX4 provides a practical and cost-effective wiring solution.
New platforms are being introduced with power efficiency and cooling advances that can meet demanding HPC requirements, even for large clusters.
New benchmarks are proving that 10GE can provide real business benefits in faster job execution, while maintaining the ease-of-use of Ethernet.
Blade server technology can support 10GE while meeting the demanding physical requirements of large clusters.
With Gigabit Ethernet the de-facto standard for all but the largest cluster applications and the last hurdles to 10GE for HPC cleared, it’s time to re-create the image of the HPC network: standards-based components, widely-available expertise, compatibility, high reliability, and cost-effective technology.