Low Cost/Power HPC

The low power Intel Atom offers some interesting possibilities for HPC. Does it make sense to try this approach?

In my last column, I mentioned the idea of disposable HPC. The concept is based on building clusters using low cost nodes (less than $500 each). Because lower cost nodes will run slower and have less cores than the big server nodes, they would need to be smaller and use much less power. They will also need a lowerThermal Design Power (TDP) to accommodate dense packaging designs.

Is such a cluster possible? Before we take a look at some real options hitting the market, let’s take a look at some numbers. In a recent article at Extremetech, they discuss a paper by researchers at Harvard and Microsoft that seeks to prove that a small, power-efficient core like the Intel Atom chip can be better suited for search applications. The paper is due to be presented at the International Symposium on Computer Architecture in Saint-Malo, France, on June 19.

According to Extremetech, the paper states that with an average power consumption of 3.2 watts, the Atom (single-core 1.6 GHz Diamondville) is 19.5 times more power-efficient than a Xeon (L5420 quad-core, 2.5-GHz, Harpertown), which consumes 62.5 watts on average. The Xeon, meanwhile, outperformed the Atom by 3.9 times for search tasks.

On a cost basis, the researchers estimated an Atom server would cost about $375, versus $1,210 for a Xeon server, with the only variables being the prices of the processor and motherboard. While these numbers looked interesting, I wanted to try my own comparison for HPC purposes. Keep in mind this is a back of the envelope comparison and there are many other issues that may affect performance. My goal is to get a general idea of how the processors compare in terms of HPC.

I went to Intel’s website and got most of the information for both an Atom D510 (dual core, 1.66 GHz) and Xeon 5570 (quad-core, 2.93 GHz). Both of these represent currently available parts and I used Intel’s stated price for quantities of 1000. Next, I tried to find floating point results for both processors. There are no SPEC numbers for the Atom, but I did find some POV numbers for both the D510 and an i7 870 running at 2.93GHz. (POV is a rendering program that uses heavy floating point processing). Since the i7 and Xeon are very similar, it seems reasonable to use the POV number as a measure of performance (remembering it is a single measurement). The results of my comparison are in the Table below. Note for the TDP/Perf and Price/Perf ratios, lower is better.

Processor Clock
(GHz)
Cores TDP
(Watts)
Price
(qty 1000)
Perf
(POV 3.7 SMP)
TDP/Perf Price/Perf
Atom D510 1.66 dual 13 $63 470.6 .028 .13
Xeon 5570 2.93 quad 95 $1386 3603 .026 .39
Xeon/Atom
Ratio
1.8 2 7.3 22 7.7 .93 3

The results are rather interesting. The Nehalem Xeon runs 1.8 times faster, generates 7.3 times as much heat and costs 22 times as much as the D510 Atom. The Xeon performance is 7.7 times faster, but when you factor in the price-to-performance the Atom is 3 times better than the Xeon solution. Interestingly, the TDP/performance ratios are almost identical for both processors. Thus, there is no real power advantage with either processor.

The Atom has a clear price-to-performance advantage over the Xeon, however, it is not quite that simple. Since you need at least 8 Atom processors to do the work of one Xeon, you need 8 separate nodes. The cost per Atom node includes things like motherboards, memory, etc. so the 3X price to performance advantage may quickly evaporate. It may be possible to narrow the performance gap or there may be better performing floating point applications suited for the Atom, but for now we’ll use this data as a first approximation. Also note, I have not considered the fact that with 8 Atom nodes, there is 8 MB of total distributed cache available to a parallel application.

Intel has released six Pineview Atom chips (and announced two more). The following table shows the specifications for these processors. Note that Total TDP is for the processor and the supporting chip-set.

Model Cores Cache Speed Memory CPU TDP Total TDP
Atom N450 single 512K 1.66 GHz DDR2 5.5 Watts 7 Watts
Atom N455 single 512K 1.66 GHz DDR2/3 5.5 Watts 7 Watts
Atom N470 single 512K 1.83 GHz DDR2 6.5 Watts 8 Watts
Atom N475 single 512K 1.83 GHz DDR2/3 6.5 Watts 8 Watts
Atom D410 single 512K 1.66 GHz DDR2 10 Watts 12 Watts
Atom D425* single 512K 1.66 GHz DDR3 ~10 Watts ~12 Watts
Atom D510 dual 1MB 1.66 GHz DDR2 13 Watts 15 Watts
Atom D525* dual 1MB 1.66 GHz DDR3 ~13 Watts ~15 Watts

*The D425 and the D525 have been announced for a June 2010 release)

Because Atoms are destined to end up in small low power designs, there is no need for a huge motherboard. Indeed, many Atom motherboards use the Mini ITX form factor. Most users are familiar with the ATX form factor where the motherboard measures 12×9.6 inches (305×244 mm). Developed by VIA Technologies, a Mini ITX motherboard measures 6.7 inches square (170mm), which is roughly 40% the size of an ATX motherboard. Most Mini ITX systems are designed to be passively cooled and are intended for low power operation such as desktops, routers, and even servers.

There is currently a large number of Mini ITX motherboards available to Linux users. As listed at the LinuxTECH.net site there are over 20 Mini ITX boards currently available. Almost all of these boards are in the $100 price range. Of particular interest is the Asus Hummingbird. The Hummingbird is designed as a server board and offers optional remote management (IPMI) capabilities, provides dual Intel GigE network interfaces, and a 10/100 management NIC.

There are also some other factors that may make Atom nodes attractive. Lower power and less heat means less fans and even passive cooling for the processor. This removes the cost and space from case designs and reduces the need to be concerned about broken fans. It is not too difficult to envision shared power supplies and blades, using Mini ITX motherboards, to help increase density and lower the packaging cost.

Of course, the choice of processor and interconnect depends on your application. Low power/high node count clusters are certainly possible right now. For those applications that require low latency communications, Xeon servers and InfiniBand are still the way to go. For other applications, low power solutions may provide better price-to-performance than one might expect, but the devil, as the say, is in the benchmarks.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62