The right plan, the right partner, and a six-step process are essential to recovering from a data center outage.
by Wally Vahlstrom, Emerson Network Power
Mother Nature has the potential to inflict massive damage on electrical systems in data centers through floods, lightning strikes, tornados, earthquakes and more. As destructive as disasters may be to costly electrical equipment, the trouble can be greatly compounded if the hazards associated with damage recovery are not appropriately managed.
Preparedness for swift business recovery hinges on planning ahead to activate such emergency assistance. Facility and data center personnel will face far fewer delays in securing help if a service contract is in place before it is needed. From damage assessment and inspection to equipment repair, product refurbishment and replacement, a service provider can dispatch NETA-trained (InterNational Electrical Testing Association; www.netaworld.org) resources to quickly get you online and back to "high-nines" reliability with a focus on protecting your equipment and personnel. Here are the steps that facility and data center managers can and should delegate to a service partner in the aftermath of a disaster.
|Electrical maintenance testing, shown here, is just one of many functions that a service provider offers. Such a provider can also get a data center back online quickly after an outage, through the six-step process of damage assessment, inspection and testing, repair or reconditioning, equipment replacement, spare-parts provision, and acceptance testing/startup.|
Step One: Damage assessment-This is job one of disaster recovery. It is necessary to conduct a damage assessment of your uninterruptible power supply (UPS) units and the entire electrical distribution system, including circuit breakers, transformers, switchgear, cables, busway, relays, generators and batteries. From this effort will come a detailed inventory of the equipment, and a preliminary condition assessment that recommends testing and other actions needed to bring the system back online efficiently and safely.
Step Two: Inspection and testing-In this step, each piece of electrical distribution equipment must be physically inspected for damage, then cleaned and dried. Electrical testing will be performed as required to determine the serviceability of each piece of equipment. Recommendations for repair or replacement will then be determined.
Step Three: Repair or reconditioning-Electrical equipment exposed to water can be extremely hazardous if re-energized without proper reconditioning or replacement. Reductions in the integrity of electrical insulation due to moisture, debris lodged in the equipment components, and other factors can affect the ability of the equipment to perform as intended. Additionally, flood waters contaminated with chemicals, sewage, and oil will also affect the integrity and performance of the equipment.
The ability to recondition the equipment will vary with the nature of the electrical function, the degree of flooding, the type and age of the equipment and the length of time the equipment was exposed to water. For equipment that is determined to be serviceable, the right service provider can deliver complete repair and reconditioning services for virtually any manufacturer's equipment and according to National Electrical Manufacturers Association (NEMA; www.nema.org) standards, which promote safety in the design, manufacture and use of electrical products. This may also be an opportunity to retrofit certain equipment in order to update the infrastructure with the latest technologies.
Step Four: Replacement equipment-When equipment is determined to be unserviceable, it is crucial to be able to quickly locate new, surplus and remanufactured electrical distribution equipment of all types and from all manufacturers. Having that, along with technical support to perform or manage complete installation and startup services for new equipment, is an important consideration when selecting an emergency response partner. You'll also want a partner that is local and familiar with the type of construction and equipment used in your area.
Step Five: Spare parts support-Finding the right parts to get your equipment back online can be challenging during a widespread recovery effort. A service provider should have an extensive network of electrical suppliers through which replacement, obsolete, and potentially hard-to-find parts can be located quickly in order to get systems up and running. Additionally, access to original equipment manufacturer (OEM) parts is ideal. Using OEM parts for the UPS, which is the foundation of your emergency power system, is the best way to ensure that the quality of the new parts is equivalent to the parts being replaced.
Step Six: Acceptance testing and startup-Once it's known all that must occur to resume operations at the optimum level, a service provider can perform or oversee the equipment installation, acceptance testing and startup process. Verification of proper installation through acceptance testing should always be in accordance with NETA/ANSI specifications. Data obtained during this acceptance testing provides a reliable baseline for trending and comparison during future maintenance tests. Another area of attention for your service provider will be fully understanding manufacturers' recommendations, industry standards and unique safety issues, especially if there are additional training needs for your personnel.
It should be noted that acceptance testing following a disaster has safety, reliability and efficiency benefits that could offset the costs. It can also help fortify electrical and emergency backup power systems against future storms. For example, studies have shown that nearly 70 percent of early equipment failures can be traced to design, installation or startup deficiencies. These failures may not even occur until months after getting back online. With proper acceptance testing following installation, issues can be remedied up front before damage to equipment or a failure occurs.
Acceptance testing can also maximize efficiency by ensuring the system is fully integrated. It can help critical facility managers avoid a whole range of problems, from nuisance tripping to failure to trip under fault conditions, leading to major equipment damage, disruption to service, and potential hazards to personnel.
No facility or data center manager wants to face catastrophic equipment losses following a natural disaster, but disaster recovery does not have to be done alone. At the very least, you should always work with highly trained technicians equipped with personal protective equipment for arc flash and electrical safety.
Having a recovery plan is important for getting back to business quickly and cost effectively without taking on additional hazards. Having the right recovery partner to help execute the plan-one that relies on best practices for restoration and maintenance-is ideal for protection now and into the future.
Wally Vahlstrom brings more than 40 years of electrical engineering experience to his position as director of technical services for Emerson Network Power's Electrical Reliability Services (www.emersonnetworkpower.com). In that role he is responsible for failure-investigation work, conformity-assessment services, power-system studies and reliability analysis.