Data Centers and HPC: the Energy Challenge


If you were asked to list industries whose carbon footprint contributes dramatically to global warming and represents a threat to our planet’s delicate climate balance, you would probably not think of IT as one of the top “energy hogs”.

You would be wrong. IT is rapidly climbing in this not-so-virtuous ranking, with its double digit growth rate in energy consumption dwarfing the transportation industry’s own of 1% per year. According to the industry consortium GreenTouch, the IT industry accounts today for roughly 2% of the world’s total energy consumption, comparable to the airline industry.

While the fantastic computing power that technology is making available to businesses and to strategic fields such as science, applied research and industrial R&D is enabling progress, ultimately contributing to the better good of human kind in ways rarely seen before, the desire for even faster progress is boosting the need for ever increasing computing power and for access to huge amounts of data, all but accelerating this segment’s demand for energy.

Data centers are responsible for a large chunk of the energy consumed by IT. According to the U.S. Environmental Protection Agency (EPA) the amount of energy consumed by data centers doubled between 2000 and 2006 alone. This trend slowed down in 2007 amid economic crisis and better data centers efficiency, to start accelerating again in the last 2 years. A recent report from the Natural Resources Defense Council (NRDC) claims waste and inefficiency in U.S. data centers – that consumed a massive 91 bn kWh of electricity in 2013 – will increase to 140 bn kWh by 2020, the equivalent of 50 large (500 megawatt) power plants.

This has become largely unsustainable.

In an ideal situation, IT equipment should use 100% of the energy consumed by a data center. Unfortunately, reality is different. The percentage of energy used for IT equipment varies between 60% and 30% of the total energy consumed by the whole data center. The parameter that best measures data center energy efficiency is PUE (power usage effectiveness). The closer PUE is to 1 the better: a PUE of 2.5 means that for every Watt consumed by the data center 1 is used  for the IT equipment and 1.5 Watt goes for cooling or other not essential activities. A PUE of 1 means that 100% of the energy used by the data center goes into IT equipment.

A 2014 Uptime Institute annual data center survey reveals that data center power usage efficiency (PUE) metrics have plateaued at around 1.7 after several years of steady improvement.

The main reasons of this low efficiency are energy waste in the electric conversion needed to power equipment (transformers, rectifiers, UPS) and energy used to cool IT equipment through chillers and CRAC units.

There are many technologies that are being used to improve energy efficiency in data centers: virtualization, hot and cold aisle containment, increase of thermal envelope, air flow optimization, DCIM (Data Center Infrastructure Management). These are all low hanging fruits that allowed for an increase in efficiency, but they don’t allow reaching sustainable levels.

HPC (High Performance Computing) has long seen energy cost and availability as the biggest challenges for future developments. It is not a surprise that the HPC segment is currently adopting the most advanced solutions for energy efficiency, aiming to reduce consumption of both IT equipment and datacenter infrastructure, as well as reusing the thermal energy servers produce.

Designing more energy efficient systems means taking an approach where efficiency comes first. This implies making HW and SW design choices that maximize performance within a target power budget, leveraging heterogeneous architectures, accelerators, solid state disks, no-fan liquid cooled systems and in general choosing always the components that can guarantee more efficiency.

Wherever possible, the goal must be to achieve “free cooling”, meaning the IT equipment should be cooled without using additional energy, for instance by eliminating chillers, thus pushing down the datacentre PUE to levels around 1.05, very close to the ideal value of 1. Free cooling is only feasible when the coolant has a temperature higher than the external air temperature. If the outside temperature is very low, for instance at high latitudes or elevations, the coolant may be air, in all other cases it has to be liquid, typically water, warm enough to be cooled with outdoor air also in hot seasons. The only way to use warm water to cool IT equipment is to bring it as close as possible to where the heat is generated, at direct contact with the components (“direct liquid cooling”).

The development and optimization of the technologies leading to better energy efficiency in IT, and in HPC, requires non trivial R&D investments by the manufacturers of large scale computers used in datacenters and HPC centers. While better energy efficiency contributes significantly to lowering the Total Cost of Ownership of IT equipment, the R&D costs manufacturers sustain may lead to higher prices for equipment built according to energy efficiency criteria.

As for many other areas around carbon footprint reduction, “doing the right thing” may end up being economically less attractive than doing the “wrong” one.

While in other industrial and domestic segments (transportation, heating, renewable energy generation), policy, recommendations, stricter regulations and incentives start yielding tangible results, the IT industry and HPC have been only marginally touched by such initiatives.

As long as setting up an energy inefficient datacentre is an economically viable option for IT equipment owners, it is unlikely that substantial progress will be made towards reversing a dangerous trend.

While the issue is a planetary one, now is a good time for Europe to take it in its own hands and show the planet the way towards a more responsible and energy conscious future for the IT industry and High Performance Computing.

Data Centers and HPC: the Energy Challenge

5 thoughts on “Data Centers and HPC: the Energy Challenge

  1. Fabio, thank you very much for your insightful post. Energy efficiency is one of the requirements addressed by the FET HPC call closed last November and is currently being evaluated. I am sure that the projects that will be funded under the FET HPC call will definitely contribute to address that planetary challenge (from Europe). Augusto

    Like

  2. Aniyan Varghese says:

    This is an area where we really need to rethink the whole strategy; even going back to the very basics.
    To quote one of the MIT researchers ‘ a computer in which all chips and circuits perform reversible functions with no transfer of heat to or from their surroundings’ .

    Liked by 1 person

  3. Kimmo Koski says:

    Fabio has naturally good points in this excellent blog writing. Energy consciousness and sustainability should become increasingly selection criteria in making decisions for example in datacenter placements. Although there are some practical and psychological limits, the country borders or physical distanced start to be increasingly less of a problem.

    We for example have been running already for two years a datacenter in Kajaani former paper mill about 560 km north from Helsinki with a PUE 1.06 (measured over 12 months, hot seasons included – believe or not, but sometimes even in Kajaani it is over 30 Celsius :-)) using 100% renewable hydro power. The systems are air-cooled and the cooling in practice done by free cooling, e.g. opening the windows. The nice thing is that in addition to being more environment friendly we also save a substantial amount of money.

    However, in addition to making the baseline datacenters energy efficient, I think we could do even better, and I would like to extend the discussion to the total efficiency of the computational process. Few examples were we can also improve:
    – Servers are often run in a very low usage percentage. If we can use them more efficiently, for example through more efficient virtualization, we will save in hardware costs (less systems), possibly in software (less licenses) and workload (less maintenance). In addition, consolidation can drive people to use larger datacenters and it is easier to optimize one big center’s energy efficiency than in many small ones.
    – We can often monitor the power usage and load of the system, but it is difficult to monitor the efficiency of the computing task itself. I give an example. Couple of years ago we had a project where a kind of optimization clinic was established. Users who were talented scientists but not always experienced programmers could bring their code there and the software specialists tried to optimize the code to run faster. Usually it was worth of the effort and the typical speedups varied around 2-4. But there were dramatic exceptions, too. We for example lost one supercomputer customer when the speedup turned out to be 40000 and instead of Cray he started to run it in his laptop.

    The latter example requires more people in sw development and investments in training and education – both are good ways to spend money for. An European level code optimization center (or network of them) could be a good topic for an EC call. Some activities have naturally been done for example by PRACE implementation projects, but more is needed.

    So Fabio started a very important discussion here about energy efficiency and I would like to extend it to include efficiency of the whole from human being through ICT process to power usage. Holistic view for efficiency in HPC could make a nice topic for a workshop or brainstorming session …

    Liked by 1 person

  4. Giovanni Mattiussi says:

    Kimmo has some very good points. The approach to energy efficiency has to be holistic: facilities, IT hardware, software and brainware (the way developers design an application). I work for Eurotech so I deal with making IT hardware and facilities as efficient as possible. It is however paramount that these efforts are comforted by an energy aware approach in any part of the system/solution. In this respect concepts like “energy to solution” may soon become as important as “time to solution”.

    Like

  5. Dan Perry says:

    Good article and good observations from Kimmo. Efficiency is one aspect, energy source is another. As well as Norway there are some DCs in Iceland that use entirely renewable power, however network connectivity is also critical and performance and pricing must also work if processing power is going to be based in a ‘green’ area.

    Like

Leave a comment