Burning Money: Cluster Power and Cooling

As vendors strive for faster processors and denser systems, power and cooling has become a major issue for the HPC market.

Many years ago, I set up a Web server using a spare 486DX2 tower machine and Linux. The experiment was to test Linux, and also start serving some very basic HTML from a dial-up Internet connection. The system worked beautifully — too well, in fact. It was supposed to be a demonstration, but turned into a working mail and Web server for a small company. When the machine was finally taken out of service, I removed the case and found that the tiny CPU fan had fallen off and the processor was running “naked” (my dust forensics told me it was like this for quite a while). When your processor is putting out a maximum of 7 watts, you can live with such failures. As anyone can tell you, a fan failure is not tolerable in today’s servers where the heat output is more than ten times what it was in my little Web server. Heat has become a major problem.

As vendors strive for faster processors and denser systems, power and cooling has become a major issue for the HPC market. For system designers, the rule of thumb is that a 10 degree Celsius rise in temperature produces a 50% reduction in the long-term reliability of electronic hardware. Reliability in today’s servers depends on how well you can move heat with air (I am going to ignore liquid cooling methods for now). Thermodynamics tells us that in a closed system we can only move heat and that (re)moving heat requires power and generates heat as well (fan motors get hot). This produces a kind of spiral effect: removing more heat requires more power, which creates more heat, and on and on. In other words, there is thermal limit as to how much heat generating hardware one can pack into a small space.

Idle heat is another issue that’s often overlooked. In the past there was a significant difference between the power draw of a loaded system versus an idle system. In today’s world, this is not so much the case, and an idle system can use almost the same amount of electricity as a fully loaded system. If a cluster is running at 75% of peak (i.e. 25% of the nodes are not being used), then 25% of power and cooling costs are being wasted on idle cycles. In essence, the cluster is burning money (not to mention the ecological impact).

Reducing the amount of heat generated by clusters would help reduce overall costs and result in less wasted money. Cluster nodes have two major sources of heat, the power supply and the processor. The power supply is often the forgotten component. A typical power supply is about 65% efficient, which means, 35% of the electricity that enters the power supply, leaves as heat — not quite a space heater, but close. Yes, that is a lot of heat. Fortunately, new high efficiency power supplies are available (look for the 80 Plus logo) that provide 80% or greater efficiency (efficiency is load dependent). Less heat, less cooling, less total power, and we can begin to spiral down a bit and we can increase the system density.

The processors are the other significant heat source in a server. Recent Intel processors have maintained a predictable power signature of 80-120 Watts allowing for existing infrastructure to use newer processors without power and cooling upgrades. As mentioned, an idle server can generate almost as much heat as fully loaded system. One solution would be to have the cluster scheduler power down the nodes when they are not in use. While this is certainly an option, a better solution is in the works.

Newer Intel processors have what are know as power saving C-states that allow the processor to significantly reduce power usage by idling certain parts of the system. The trick is to convince the Linux kernel (and drivers) to cut down or eliminate the number of checks it does to see if there is work to be done. Newer Linux kernels can now operate in what is called a “tickless idle” so that deep power down states can be maintained longer.

Intel even provides a tool called PowerTOP to see what events wake up your processor from its power-saving modes. It is possible that idle nodes can draw only a few watts of power and ramp up quickly when the scheduler sends a task to the node. This behavior will have a direct cost saving on clusters with less than 100% utilization.

These new developments may produce large pay-offs in the cluster world. First, better power supplies will reduce the overall heat generation and reduce cooling costs.(Note: Other solutions such as running servers from a DC rail instead of AC also offer better power efficiencies as well). Second, keeping processors within an acceptable and predictable power signatures allows for better planning and less heat related issues. And finally, power saving idle modes may significantly reduce the power and cooling costs of many clusters.

Comments on "Burning Money: Cluster Power and Cooling"

It’s a shame you don’t have a donate button! I’d definitely donate to this superb blog! I suppose for now i’ll settle for bookmarking and adding your RSS feed to my Google account. I look forward to brand new updates and will talk about this site with my Facebook group. Chat soon!


Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>