10GE is Ready for Your Cluster

The wait is over. Next generation Gigabit Ethernet has hit the ground running

Say “cluster” and try to keep your mind from images of massive, government-funded scientific applications or herds of caffeine-fueled grad students. Pretty tough. But in fact, the vast majority of high performance computing (HPC) clusters are nowhere near large enough to qualify as massive, are used in commercial environments, and run on Gigabit Ethernet interconnects. Even within the TOP500(r) Supercomputer Sites the number of clusters running Gigabit Ethernet is more than double the number of clusters running InfiniBand. Certainly, higher speed and lower latency would be nice for any installation. But the performance requirements for most applications just don’t merit the high cost and labor-intensive maintenance of InfiniBand.

What most Gigabit Ethernet HPC sites could really use is an upgrade to 10 Gigabit Ethernet (10GE)-if it could be done cost-effectively and reliably. Until now, that idea would generate hesitation and skepticism among knowledgeable decision-makers. But with Gigabit Ethernet already entrenched in the HPC market and providing a slew of advantages, only a few obstacles have prevented the widespread growth of 10GE. Those obstacles are quickly evaporating. With recent technology advances, pricing improvements, and proven vendors entering the market, the choice of 10GE for HPC clusters has become quite attractive.

Understanding 10GE

Understanding the environment for 10GE merits a little history. Although Ethernet has been around for three decades, the technology remains viable because it has evolved over time to meet changing industry requirements. Widespread Ethernet adoption began when the IEEE established the 10 Mbps Ethernet standard in 1983. That standard evolved to Fast Ethernet (100 Mbps), Gigabit Ethernet (1000 Mbps), and 10 Gigabit Ethernet, with 40 and 100 Gigabit standards coming soon. In fact, discussions have started about Terabit Ethernet-a million Mbps-a speed that was hard to imagine just a few years ago.

Despite this evolution, the basic Ethernet frame format and principles of operation have remained virtually unchanged. As a result, networks of mixed speeds (10/100/1000 Mbps) operate uniformly without the need for expensive or complex gateways. When Ethernet was first deployed it could easily be confused with true plumbing-it was coaxial tubing which required special tools even to bend it. As Ethernet evolved it absorbed advancements in cabling and optics, changed from shared to switched media, introduced the concept of virtualization via VLANs, and incorporated Jumbo Frames and many other improvements. Today Ethernet continues to evolve with sweeping changes such as support for block-level storage (Fibre Channel over Ethernet).

Ratified in 2002 as IEEE 802.3ae, today’s 10GE supports 10 Gigabits per second transmission over distances up to 80 km. In almost every respect, 10GE is fully compatible with previous versions of Ethernet. It uses the same frame format, Media Access Control (MAC) protocol, and frame size, and network managers can use familiar management tools and operational procedures.

Ethernet Advantages for HPC

The fact that more than half of the TOP500 Supercomputer Sites and almost all smaller clusters run Ethernet is no surprise when you look at the benefits this technology offers:

  • High Comfort Level: As a widely-used standard, Ethernet is a known environment for IT executives, network administrators, server vendors, and managed service providers around the world. They have the tools to manage it and the knowledge to maintain it. Broad vendor support is also a plus – almost all vendors support Ethernet.
  • Best Practices: High availability, failover, management, security, backup networks, and other best practices are well-established in Ethernet and their implementation is widely understood. This is another example of the wide acceptance and vendor support for Ethernet. (Good luck finding an InfiniBand firewall, for example!)
  • Single Infrastructure: Ethernet gives HPC administrators the advantage of a single infrastructure that supports the four major connectivity requirements: user access, server management, storage connectivity, and cluster interconnect. A single infrastructure is easier to manage and less expensive to purchase, power, and maintain than using a separate technology for storage or for the processor interconnect.
  • Lower Power Requirements: Power is one of the biggest expenses facing data center managers today. New environmental mandates combined with rising energy costs and demand are forcing administrators to focus on Green initiatives. Ethernet is an efficient option for power and cooling, especially when used in designs that reduce power consumption.
  • Lower cost: With new servers shipping 10G ports on the motherboard and 10G switch ports now priced below $500, 10GE has a compelling price/performance advantage over niche technologies such as InfiniBand.
  • Growth Path: Higher-speed Ethernet will capitalize on the large installed base of Gigabit Ethernet. New 40GE and 100GE products will become available soon, and will be supported by many silicon and equipment vendors.

For those applications that could benefit from higher speeds, 10GE offers even more benefits.

  • More Efficient Power Utilization: 10GE requires less power per gigabit than Gigabit Ethernet, so you get ten times the bandwidth without ten times the power.
  • Practical Performance: 10GE can obviously move data 10 times faster than Gigabit Ethernet, but due to the new generation of 10GE NICs it also can reduce latency between servers by about 8 times. This bandwidth and latency gain translates into higher application performance than you might imagine. For molecular dynamics (VASP running on a 64 core cluster) the application ran more than six times faster than Gigabit Ethernet and was nearly identical to InfiniBand DDR. In a mechanical simulation benchmark (PAM CRASH running on a 64 compute core cluster), 10GE completed tasks in about 70 percent less time than Gigabit Ethernet, and was equal to InfiniBand DDR. Similar results have been observed on common HPC cluster applications such as FLUENT and RADIOSS, and more test results are coming in with similar results.

These benchmarks are impressive. Vendors love talking about microseconds and gigabits per second. But the real advantage in commercial applications is the increase in user productivity, and that’s measured by the clock on the wall. If computations run 70 percent faster, users can be 70 percent more productive.

The advantages of 10GE have many cluster architects practically salivating at the prospect of upgrading to 10GE, and experts have been predicting rapid growth in the 10GE for cluster market for years. That hasn’t happened-yet.

Obstacles Eradicated

Until recently, 10GE was stuck in the starting gate because of a few – but arguably significant – problems involving pricing, stability, and standards. Those problems have now been overcome, and 10GE has taken off. Here’s what happened.

  • Network Interface Cards (NICs): Some early adopters of 10GE were discouraged by problems with the NICs, starting with the price. Until recently, the only NICs available for 10GE applications cost about $800 and many users prefer to use two of them per server. Now server vendors are starting to add an Ethernet chip to the motherboard-known as LAN-on-Motherboard (LOM) – instead of using a separate board. This advance drops the cost to well under $100 and removes the NIC price obstacle from 10GE. Standalone NIC prices are now as low as $500 and will continue to drop as LOM technology lets NIC vendors reach the high volumes they need to keep costs down.
  • Another NIC-related obstacle was the questionable reliability of some of the offerings. A few of these created a bad initial impression of 10GE, with immature software drivers that were prone to underperforming or even crashing. The industry has now grown past those problems, and strong players such as Chelsio, Intel and Broadcom are providing stable, reliable products.

  • Switch Prices: Like NICs, initial 10GE switch prices inhibited early adoption of the technology. The original 10GE switches cost as much as $20,000 per port, which was more than the price of a server. Now list prices for 10GE switches are lower than $500 per port, and street prices are even lower. And that pricing is available for embedded blade switches as well as the top of rack products.
  • Switch Scaling: A market inhibitor for large clusters was how to hook switches together to create a nonblocking cluster. Most clusters are small enough that this is not an issue. For larger clusters, CLOS technology for scaling Ethernet switches provides a solution, and is starting to become established in the market.
  • PHY Confusion: Rapid evolution of the different fiber optic transceiver standards was a stopper for customers. Standards defining the plug-in transceiver quickly changed from XENPAK to X2 to XFP to SFP+, with each bringing smaller size and lower cost. But because each type of transceiver has a different size and shape, a switch or NIC is only compatible with one option. Using multiple types of optics would increase data center complexity and add costs such as stockpiling additional spares. With visions of Blue-ray versus HD-DVD, VHS versus Betamax, and MS-DOS versus CP/M, users were unwilling to bet on a survivor and shunned the technology as they waited to see which way the market would move.
  • Eventually, the evolution culminated in SFP+. This technology is specified by the ANSI T11 Group for 8.5- and 10-Gbps Fibre Channel, as well as 10GE. The SFP+ module is small enough to fit 48 in a single rack-unit switch, just like the RHJ-45 connectors used in previous Ethernet generations. It also houses fewer electronics, thereby reducing the power and cost per port. SFP+ has been a boon to the 10GE industry, allowing switch vendors to pack more ports into smaller form factors, and lowering system costs through better integration of IC functions at the host card level. As a result, fewer sparks are flying in the format wars, and the industry is seeing a very rapid convergence onto SFP+.

  • Cabling: Many users have been holding out for 10GBase-T because it uses a common RJ45 connector and can give the market what it’s waiting for: simple, inexpensive 10GE. But the physics are different at 10GE. With current technology, the chips are expensive, power hungry, and require new cabling (Cat6A or Cat 7). 10GBase-T components also add 2.6 microseconds latency across each cable-exactly what you don’t want in a cluster interconnect. And as we wait for 10GBase-T, less expensive and less power-hungry technologies are being developed. 10GBASE-CX4 offers reliability and low latency, and is a proven solution that has become a mainstay technology for 10GE.

    Making the wait easier is new SFP+ Copper (Twinax) Direct Attach cables, which are thin, passive cables with SFP+ ends. With support for distances up to 10 meters, they are actually ideal for wiring inside a rack or between servers and switches that are in close proximity. At an initial cost of $40 to $50 and with an outlook for much lower pricing, Twinax provides a simpler and less expensive alternative to optical cables. With advances such as these, clarity is overcoming confusion in the market. The combination of SFP+ Direct Attach cables for short distances, familiar optical transceivers for longer runs, and 10GBASE-CX4 for the lowest latency, there are great choices today for wiring clusters.

Comments on "10GE is Ready for Your Cluster"

billtodd

If linux-mag aspires to becoming a megaphone for marketeers it’s off to a flying start with this one. At least you did note that the author is a VP of the company whose product came out appearing more attractive in this comparison.

It would be nice to believe that 10 GbE is practical, right now, but I’m afraid that I’ll wait for a less biased evaluation before accepting that. I mean, exactly what kind of latency is supposedly being measured in the chart? Both 10 GbE and Infiniband achieve latencies on the order of 1 microsecond, so whatever is taking a large fraction of a millisecond is certainly not the hardware or low-level drivers: rather, it’s application-level logic.

Which of course is likely true for the throughput figures as well. Without knowing how the application is using the transport (or for that matter even knowing which flavor of Infiniband is being used, how the other hardware compares, etc.), attempting to draw any conclusions about the underlying merits of the two options is ridiculous.

- bill

Reply

My brother suggested I may like this website. He used to be totally right. This put up truly made my day. You cann’t believe just how so much time I had spent for this information! Thank you!

Reply

Milan, Man utd, Par Saint Germain y Porto son los conjuntos que se cruzan en el camino de spain’s capital, Real the city, Valencia y M en los octavos de extremely de la Liga de Campeones, Cuyo sorteo tuvo lugar este jueves en los angeles sede de los angeles Uefa en Nyon(Suiza).

Alfredo Flores, Juez ?nico de Competici?n durante el certamdurante copero, Decidi? no intervenir de oficio y tampoco abrir united nations expediente al zaguero, Pues no hubo denuncia del the capital.

camisetas de futbol baratas

Croatia se acord? de la forma de juego que lo llev? a la final y comenz? a tocar el bal?n, Empez? a gduranteerar acercamidurantetos durante jugadas detduranteidas, Pero sin preocupa suitabler any kind of a major Iker Ca verysillthe news, Quien inici? su participaci?n disadvantage un manotazo, Tras un tiro de esquina para evitar el remate del competition.

Shedd diez primeros lugares se completaron con el alem?n Nico Rosberg(Bmw), El australiano Daniel Ricciardo (Toro Rossi), El alem?n Nico Hulkenberg (Force china), El venezolano Pastor Maldonado (Williams) Y simplymca el finland?s Kimi Raikkonen(Lotus).

camisetas de futbol baratas 2014

Eric Abidal lanza fuerte acusaci?n: “Los meses que estuve enfermo el spain’s capital no me pag?”Neymar dice que aument? su admiraci?n por Messi al conocerlo en persona Lionel Messi dice que el the capital no escondi? nada sobre sus problemas f?sicosDel Bosque mantiene en duda la titularidad de Casillas en Espaa ante FinlandiaLionel Messi y su padre pagan cinco millones de euros en caso fraude tributario en Espaa

Carlo Ancelotti sapr venerd se avr il via libera definitivo per sedersi sulla panchina del Real this town. Il tecnico di Reggiolo incontrer infatti tra due giorni il presidente del Paris st Germain AlKhelaifi, Un vertice che secondo i multi-multimedia systems francesi sar risolutivo, In united nations modo o nell’altro. Intanto gioved far visita a Carletto l’intermediario del Real this town Ernesto Bronzetti. Il presidente dei blancos Florentino of theez an ora glissa, During attesa di novit, Ica intanto affida il ruolo di dirittorat thelizabeth tvitamin ycnico a Zinat thelizabethdini Zidanat, Tra l’altro old boyfriend allievo di Ancelotti: “No sappiamo chi sar l’allenatore, Ma vogliamo che l’intero progetto sportivo del Real the town sia guidato da Zinedine Zidane. Un credevo che a Zidane sarebbe piaciuto essere allenatore, Mum in questi anni ha deciso di diventarlo. Di sicuro pu fare il manager sportivo e mi piacerebbe che una character come lui, Di history caratura e sapienza calcistica, Facesse not giorno l’allenatore. Ma ora sar il tops del nostro nuovo progetto tecnico. Nel momento in cui sceglieremo n’t allenatore, Ma ci no avverr prima di una settimana, Il nostro nuovo tecnico e Zidane analizzeranno la realt della squadra per farle work out il salto di qualit. Vogliamo tornare ad essere i migliori del mondo e in Spagna e anche i migliori each and every quanto riguarda il vivaio, Fabio Cannavaro da Dubai fa sapere che verrebbe di corsa a work out il secondo di Ancelotti: “Se il Real the town e Carlo mi chiamano, Vado di corsa, L’alternativa in each il Real l’ormai ex tecnico del Bayern Jupp Heynckes.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>