Reducing bypass airflow is essential for eliminating computer room hot spots.
Engineers from Triton Technology Systems, Inc. recently completed a comprehensive survey of actual cooling conditions in nineteen computer rooms comprising 204,400 square feet of raised access flooring. Over 15,000 individual pieces of data were collected.
What follows are some valuable excerpts from the field data and an analysis of the performance consequences of current industry computer room cooling practices. We believe these conclusions are broadly representative of U.S. computer rooms. A search of industry literature indicates no other study of similar scope.
Hotspots are a problem. Ten percent of the racks in the computer rooms studied had ambient temperatures of 75 degrees Fahrenheit or higher at the computer equipment air intake at the top of the rack.
Vertical hotspots are not related to either excessive watts per gross computer room square foot power consumption or insufficient cooling capacity. Vertical hotspots are when the ambient temperature is fine most of the way up the face of a rack or cabinet and then exhibits an abrupt temperature increase (from 5 degrees Fahrenheit to 15 degrees Fahrenheit) over a short vertical distance (6 inches).
Vertical hotspots occur because the available supply of cold air is totally consumed by computer equipment in the bottom of the rack or cabinet. Equipment at the top of the rack is left to recycle air that is likely to be very hot because its source is re-circulated exhaust air.
High temperatures seem to be causing unreliability. Sites are reporting that intermittent ghosts and outright hardware failures are three times more prevalent in the top third of racks than the bottom two-thirds. But high temperatures alone do not explain this failure rate. One hypothesis is that low relative humidity results in spontaneous electrostatic discharge (ESD) caused by triboelectric effect.
This can happen even without anyone touching the computer equipment. For a room operating at 72 degrees Fahrenheit and 45 percent relative humidity at the return to the cooling units, the relative humidity at the top of a hot rack will be 29 percent (if the corresponding equip- ment air intake temperature is 85 degrees Fahrenheit). Historically, hardware failure rates in rooms with low relative humidity have significantly exceeded field averages.
On average, the thirteen rooms studied run 2.7 times more cooling equipment than required to cool the computer heat load. Cooling overcapacity is not a predictor of successful cooling. The two rooms with the most overcapacity were running 16 times more cooling than required, yet one had 20 percent hot racks/cabinets and the other had 7 percent hot racks/cabinets.
The reason so much cooling equipment must be running is that 60 percent of the cold air (the best case was 20 percent and the worst case was 86 percent) is bypassing the computer equipment air intakes. What this means is that only 40 percent of the cold air (best case 80 percent and worst case 14 percent) is directly cooling computer equipment. This lost air is referred to as “bypass air” and is defined as conditioned air that is not getting to computer equipment. The air is escaping through cable cutouts, holes under cabinets, or misplaced perforated tiles.
In some sites, air may be escaping through holes in the computer room perimeter walls, although this was not a significant factor in the particular rooms studied.
Of the wasted bypass airflow, 39 percent is escaping through perforated tiles incorrectly placed in the hot aisle and 61 percent is escaping through unsealed cable cutout openings under racks and cabinets. At the static pressures required to successfully cool high-density heat loads using raised floors, each unsealed cable opening, on average, is wasting 4.5 kW of equivalent cooling. By comparison, fully configured blade or 1U high-end servers consume more than 8 kW per cabinet/rack.
Rather than cooling computer equipment directly, bypass airflow mixes with hot exhaust air and returns as mostly cold air back to the cooling units, thus reducing their Delta T (the difference between the cooling unit’s return and discharge temperatures – a 12 degree to 15 degree Fahrenheit Delta T is normally assumed – the only rooms close to this Delta T were the ones with the right-sized amount of cooling capacity running). A low Delta T shifts cooling unit performance toward de-humidification and reduces thermal cooling capacity.
While virtually every room studied has a significant excess of thermal cooling capacity, all cooling units have to remain running in order to compensate for low static pressure due to the airflow wasted through unmanaged cable openings. Lack of static pressure resulted in both zone hot spots where there just wasn’t enough cold air in large areas and in localized vertical spots where the supply of cold air was fully consumed by the equipment in the lower part of the rack or cabinet. Eliminating bypass airflow is critical to getting the flow of cold air to the right places to eliminate zone and vertical hot spots.
For the nineteen computer rooms studied (which ranged in size from 2,500 square feet to 26,000 square feet), the primary cooling problem was inadequate air distribution, not insufficient cooling capacity. To successfully cool heat loads greater than 2 kW per rack or cabinets deployed in groups of ten or more, bypass airflow must be reduced. To cool 1U or blade servers and other high-density equipment (8 kW or more per rack or cabinet), bypass airflow must be less than 10 percent. The three rooms closest to achieving this requirement were at 20 percent, 35 percent, and 38 percent bypass airflow. Clearly, reducing bypass airflow requires significant changes in computer room management and operating practices.
Solving the current air distribution problem in the computer rooms studied is deceptively simple:
(1) optimize the quantity and location of perforated tiles, and
(2) seal cable cutout openings starting with the largest opening:
» Reduce the number of perforated tiles to match the correct quantity of cooling units. On average, 238 percent more perforated tiles were installed than required. One room has 41 percent fewer perforated tiles than the minimum needed and another had 17 percent less than needed.
» Relocate perforated tiles in the hot aisle to the cold aisle.
» As openings are closed, static pressure will rise and bypass losses through the remaining openings will increase significantly.
» Forty-eight percent of the cable cutout openings were identified as small (less than 40 square inches) and accounted for 14 percent of bypass air lost through cable cutouts.
» Fifty-two percent of cable cutout openings were larger than 40 square inches and accounted for 86 percent of the cable cutout bypass air. Sealing the largest openings will have the largest impact on static pressure and reducing bypass airflow.
Changing the quantity or location of perforated tiles or closing openings without shutting down computer equipment while the changes are made is a high-risk proposition since 60 percent of available cooling is coming from the ambient air (the 10 poorest rooms averaged 70 percent) and not from perforated tiles in the cold aisle. Changing opening locations or closing too many holes in the wrong sequence can result in very rapid ambient temperature buildup. Correcting existing air distribution problems should not be attempted without a baseline study to diagnose problems. Corrective action will require rigorous training, rigorous controls, careful sequencing of what to do first, and careful risk management. With air turnover rates of almost one a minute, it is likely that sites ignoring these warnings could encounter severe hardware damage before knowing temperatures went out of control.
-Article written by Dr. Robert F. Sullivan and used with Triton Technology Systems, Inc.’s permission