Data Center Monitoring

Download PDF

Mission critical facilities process invaluable information that must be protected with reliability and efficiency. Best practice data centers incorporate a monitoring system—giving them an extra layer of protection. This extra layer of protection gives mission critical facility managers peace of mind—that they have taken an important step, incorporated a best practice to ensure its facility’s uptime, prolonged equipment life, and improved equipment performance.  It’s also peace of mind that, should an event happen, you have the data and event time line recorded that will allow you to see exactly what took place before, during, and after the event.  Best practice facilities

don’t just monitor alarms.  Rather, they also monitor for availability—giving them in-depth information with which better decisions can be made.  All of this information is valuable in accessing current practices, making necessary changes, and to substantiate insurance claims if needed.

Each facility’s unique risk tolerance, operating practices, and needs for historical data will determine what points are monitored. But keep in mind that it is always best to be proactive in maintaining and operating your facility than to react to trouble conditions.

Following is a breakdown of the pieces of support equipment that we recommend monitoring:

Uninterruptible power supply system

While UPS units come in several makes, models, and sizes, the majority contain a stand-alone control panel with on-board diagnostics. The panel indicates that unit’s status as well as any alarm conditions.  Some more advanced units also log historical data.  But, to access the information, an individual must physically be at the UPS unit, navigating through the various screens on the panel. If a mission critical facility is not staffed 24 hours a day, a UPS failure after hours will go unnoticed and the facility will be at risk.

This risk can be eliminated by monitoring each UPS unit. The monitoring system should not only allow access to the information from any remote location, but also send instantaneous alerts to the pagers or mobile phones of designated personnel when undesirable conditions occur.

In addition to monitoring status and alarm conditions, we recommend monitoring the UPS unit’s output voltage on all three phases (A, B, and C), battery circuit (trouble alarm, charging conditions), cabinet temperature (high alarm), battery time remaining, battery cell temperature, and battery cell resistance.  It is also important to install a UPS monitoring system that logs historical data. This data can be analyzed and used to determine potential problems before they occur.  By plotting historical data, trends can be established, creating the ability to predict future conditions.

Emergency power generator, automatic transfer switch, and utility power

Having a backup generator alone does not guarantee uninterrupted power in the event of a utility power outage. Will the ATS, in fact, transfer the load in the event of a power outage? Will the generator start?  Is there enough fuel in the generator? While there is no substitute for proper and regular maintenance of your generator and ATS, monitoring both pieces of equipment will provide an extra layer of protection and peace of mind.

For the generator, we recommend monitoring the voltage outputs. These readings will confirm that your generator is producing the design voltage outputs and will therefore not compromise your uninterruptible power supply (UPS) and battery strings.  Of equal importance is the generator’s engine. We recommend monitoring fuel tank levels and the battery and charging system.

For the ATS, we recommend monitoring ground faults and transient voltages.  Monitoring ground fault currents will ensure that your facility is properly protected and safe from electrical hazards.  Electrical service over 800 amps must have monitoring protection that provides historical data that can support the re-creation of any event. Transient voltage are spikes above the normal voltage supplied to your facility and are detrimental to both equipment and data integrity. Transient voltage suppressors clip these voltages to prevent problems.

For the utility power, we recommend monitoring kilowatt usage. This will allow you to monitor, trend, and log the electrical usage that your utility is providing.  If they are not providing what you have contracted for, you will have a record of any abnormalities defined useful for event re-creation or service disputes.

Power distribution unit

Because the power being supplied to the equipment racks must be reliable, and to allow for servicing dual corded loads, facility managers often install multiple PDUs that feed the same piece of equipment.  In the event that one PDU fails, there are other units that will continue to support the equipment.  However, it is important to monitor each PDU to ensure that all circuits are loaded below 80 percent capacity in order to prevent breaker tripping during a loss of one power path. With or without redundant power supply, it is important to include each PDU in your facility’s monitoring program to ensure the highest level of reliability.

Your facility’s equipment racks are subject to constant load changes due to the installation and/or removal of equipment. By carefully monitoring the power flowing in branch circuits, you can prevent failures or hazards due to overloading.

When including each PDU in your facility’s monitoring program, we recommend that you monitor the unit’s power usage (in kW), voltage, phase currents, and percent of load variations.  In addition, we recommend that you document the number of panels on each PDU, the number of poles per panel, and the actual load. We recommend that you regularly measure voltages and current per phase, infrared scan each breaker for possible over heating, and check all monitor and control voltages.

Computer room air conditioning units

Computer room air conditioning (CRAC) units are a vital component of your mission critical facility’s support equipment because they control the unique environment that your electronic equipment requires, specifically temperature and humidity. Without a monitoring system in place, small problems can go unnoticed and matriculate into larger problems and, quite possibly, cause catastrophic events to your business and its operations. But with proper monitoring of CRAC units, small problems can be identified and resolved quickly and easily—avoiding costly repairs and damage.

When deciding what points of the CRAC unit to monitor, it’s important to first decide what you are going to do with the information and what your level of involvement is.  You may ask yourself, “Do I have the time and expertise to decipher the information?” “Do I have a preventative maintenance provider who will look after the equipment for me?” Based on the answer to these questions, you can then define what points your system should monitor. At a minimum, we recommend monitoring supply air temperature, space temperature and humidity, moisture/leak detection, and common alarms on each CRAC unit.

Other points that are often monitored include fan status, bearing temperature, unit vibration, compressor status, and head pressure.  If you are self-performing maintenance, you may want to know if you have high compressor head pressure. This alerts you that the coils may be dirty.  If you have a routine preventative maintenance program in place, you may not need this additional information.

To monitor temperature and humidity, we recommend placing sensors under the raised access flooring, in the cold aisle, in the hot aisle (preferably at the top of your racks), and at place of air return on the CRAC unit.

If you are monitoring at these locations, you know that you have cool discharge air under the raised floor and that the cold aisle temperature is in range. You are also provided with the rack discharge air temperature (hot aisle) and the return air temperature at the CRAC unit. From this information, you will be able to diagnose problems and correct potential complications.  For example, if you have a temperature difference between the hot aisle and the return air, you know that you are bypassing cool air back to the CRAC unit return, which means you are not getting the desired cooling capacity and efficiency from the CRAC units.

Comparing multiple data points from the CRAC units allows mission critical facility managers to determine future capacity requirements and when additional cooling will be required.

Fire suppression system

Mission critical facility managers spend a great deal of money installing the most reliable and effective fire suppression and detection equipment available. The goals of every manager are to detect fire early, suppress fire quickly, minimize damage, minimize downtime, have a clear indication of system activity, and protect against false alarms. To achieve these goals, a data center manager cannot just simply install fire detection and suppression equipment. That equipment must also be constantly monitored and maintained.

When establishing a monitoring program for your fire detection and suppression equipment, you should include trouble conditions and alarm conditions.  A trouble condition informs you when a piece of equipment is not functioning properly.  For example, it will alert you if a smoke detector is malfunctioning.  An alarm condition informs you when a piece of equipment is activated.  For example, it will alert you instantly when a detector senses smoke or, if you have a sprinkler system, when the valve has been opened due to a sprinkler head being released. The monitoring program should also incorporate a tracking log. Having historical information on all trouble and alarm conditions will come in handy when creating a log of events after a fire has occurred.

Bick Group has subject matter experts in this and many other topics. Talk to a Bick Installation Services expert  by emailing:  sdavis@bickgroup.com