Top 6 data center downtime prevention tactics
Emerson Network Power identifies the top causes of data center outages.
Columbus, Ohio -- Emerson Network Power, a business of Emerson (NYSE: EMR), has put together a list of the top six tactics businesses can use to address the most common causes of data center downtime, as identified by the Ponemon Institute survey on downtime frequency and root causes. Increased adoption of these tactics can increase availability and reduce costs associated with unplanned downtime as businesses head into 2011, contends Emerson.
“Data center downtime has become unacceptable to almost every business, and yet most downtime is preventable,” says Peter Panfil, vice president and general manager, Emerson Network Power’s AC Power business in North America. “By implementing simple and cost-effective best practices, data center managers can reduce or eliminate the risk of these root causes while simultaneously reducing stranded capacity and improving energy efficiency, flexibility, total cost of ownership and end-user satisfaction.”
RELATED STORY: Why grounding is critical to data center uptime
The top causes of data center outages, as identified by the Ponemon Institute survey, involved data center power systems, thermal issues and human error. According to Emerson, the following best practices can help organizations avoid outages resulting from these common root causes:
1. Implement battery monitoring and maintenance. According to the Ponemon Institute report, battery failure is the leading cause of unplanned downtime events. Comprehensive monitoring evaluates battery health and allows data center professionals to anticipate – and prevent – problems like battery expirations. Monthly preventive maintenance tactics, including visual inspections (internal and external), acceptance testing and load testing, can help ensure components are serviced and/or replaced before they pose a risk to continuity.
2. Ensure appropriate UPS capacity. More than half of the data center professionals surveyed said their data centers had experienced downtime events as a result of exceeding UPS capacity. Measuring output multiple times per day via an integrated monitoring and management solution can help gauge the typical power draw of IT equipment over time. Establishing an appropriate UPS architecture can enable data center professionals to increase the capacity of their backup power system and eliminate single points of failure.
3. Choose the correct UPS. Forty-nine percent of data center professionals reported a UPS equipment failure within the past two years. Implementing an online double conversion UPS system, as opposed to a line-interactive system, enables the battery to be dedicated to the load and eliminates the need for power transfer if the primary utility fails. Additionally, deploying integrated UPS systems – including fans, power supplies and communications cards – enhances reliability, enabling the UPS to maintain availability between service visits even in the event of an internal component failure.
4. Invest in the right components. Downstream from the UPS, circuit breaker and power distribution unit (PDU) failures also can impact IT equipment availability. Rack-based PDUs or PDUs with integrated branch circuit monitoring capabilities allow data center professionals to make precise capacity management decisions based on holistic data across interdependent systems, reducing the likelihood of equipment overload failure downstream. Installing a static transfer switch upstream from the UPS assures IT equipment will be powered in the event of bus failure, maintaining the availability of critical IT equipment.
5. Weigh cooling options carefully. Cooling-related failures were cited as a root cause of at least one outage by more than a third of data center operators, with water incursions and heat-related computer room air conditioner (CRAC) failures cited as the leading causes of cooling-related downtime. Adopting a cold-aisle containment strategy increases the effectiveness of the CRAC system and ensures that cooling capacity is utilized as efficiently as possible. Using a refrigerant-based row-based cooling solution, instead of a water-based system, minimizes the risk of catastrophic system failures in the event of a cooling fluid leak.
6. Make the data center accident-proof. More than half of all data center professionals responding to the Ponemon survey reported at least one outage as a direct result of accidental shutdown or user errors within the past 24 months. Shielding emergency OFF buttons, accurately labeling components and implementing secure access rules can all minimize the potential for catastrophic errors and accidents.