Avoiding network speed bumps after disaster strikes

An up-to-the-minute picture of the physical network is instrumental in recovering from an unforeseen event.

Sep 1st, 2015
1509cimnetworkmanagement Photo 1

Network disasters are bad enough, as staff struggle to maintain critical communications after a natural disaster, service outages, equipment failures or sabotage. But these events can be made much worse if the network isn't properly documented. Without proper network documentation, instituting backup "failover" circuits, re-establishing connections, or even accounting for damage for insurance companies can be an extremely challenging and time-consuming task. Physical layer management (PLM) automates network documentation and makes it easier to characterize the network in the event of an equipment failure, service outage, fire, flood, tornado, or act of sabotage. Given that network downtime can cause business disruptions and unhappy customers, or can cost a service provider as much as $10,000 and up to $1,000,000 a minute, getting the network back up and running quickly is a key priority.

What is PLM?

PLM - physical layer management - systems use intelligent fiber frames, patch panels, and patch cords in a central office or data center to collect and report information about the physical state of the network. Unique chips at the ends of patch cords and in the ports of fiber frames and patch panels provide detailed information on what is being connected where, including port number, connector ID number, cable type, length, color, and performance rating. This information is reported to a database that can be viewed through a web interface, or through a standard network management or infrastructure management system via middleware.

With a physical layer management system like this one - TE Connectivity's Quareo - work order management can include a flashing green LED on a port to notify the technician to connect a circuit.

When a cable is unplugged or plugged in, the system automatically discovers the event and reports it in real time, so the network documentation is always up-to-date. This saves network technician labor that would otherwise be spent manually recording changes to the network. Accurate data on what is connected in the network is always available. Errors are prevented, as no human intervention is needed.

Most PLM systems also include a work order management system that schedules tasks for network technicians and even guides them through a particular task. For example, an order to connect a circuit might trigger flashing green LEDs on the ports that are to be connected. In addition, the work order management system may include a mobile phone app that directs technicians to specific tasks, walks them through the tasks, and solicits information that each task was completed.

Proactively plan for network outages

Some disturbances causing network outages can be addressed at the physical layer by investing in and establishing divergent routing, which is effectively having redundant circuits that are connected via separate cables and pathways to mitigate any localized damages or equipment failures in the primary circuit. This practice, along with the increased cost, can allow for near-seamless recovery if configured properly. However, this practice fails if the "A" and "B" circuit changes are not coordinated and mirrored as changes occur in connectivity, which causes the "backup" to fail. The backup system is not much use if the connections it is supposed to reflect are incorrect. PLM systems allow modeling of circuit routings and pathways, as well as tracking of configuration changes, ensuring that the backup circuits are divergently routed and connected to the appropriate devices and services for quick restoration.

Preparing for disasters

There are two key reasons to prepare for data center disasters: 1) to recover the network quickly afterwards, and 2) to prove what was in place for insurance purposes. Typically, network documentation is kept manually, in spreadsheets or other tools, and it is not updated accurately or quickly when changes are made. Because the network is constantly changing, manual documentation may not accurately show what was connected where, which services were being provided to whom, or over which circuits.

Without proper documentation in place, it can take days or even weeks to retrace every circuit in a data center or central office. Fires destroy fiber cabling so it's no longer possible to tell what was connected where, and even though floods don't necessarily destroy the connections in a data center, it's still necessary to know what services were being provided over which circuits so they can be restored properly.

Take an IP phone, for example. If a disaster takes out a phone system, the network technicians will need to know which circuit went to each desk in the organization, and which phone numbers were provided to each desk. Without proper documentation, this would involve reaching each employee and collecting all the phone numbers.

When disaster strikes

Network disasters will come in all shapes and sizes. A fire might erupt in a central office or data center, or a leaky sprinkler head might cause flooding that shorts out equipment. Sabotage could also be the cause, or a backhoe operator may sever a key access cable to the data center or central office.

Prudent service providers and enterprises will have a disaster recovery plan in place that starts with availability of a recovery site where the network can be rebuilt. The IT team then begins replicating the network that existed before. With proper documentation, this is a fairly rapid process; without it, the network may have to be redesigned almost from scratch.

Disaster recovery

One important strategy for disaster recovery is to have a PLM system in place, and to have the PLM database backed up in a protected facility or in the cloud. This way, the IT administrator doesn't have to try to recover the database from servers and storage systems affected by the disaster.

The work order management system in a PLM system can speed reconnection of needed circuits. Under control from the work order management system, the IT manager can make green LEDs flash on circuits that are to be connected, one at a time. Using a smartphone app, network technicians can be instructed exactly where to go and which ports to connect. As each task is completed, the technician reports completion through the smartphone app and the work order management documentation is automatically updated. Using this process, network managers can rebuild a network in days instead of weeks, thereby reducing business losses and customer complaints.

Without proper network documentation, network administrators usually start by restoring default services to users, and then begin delivering more-specific services to specific people. Those who complain the loudest usually get service restored the soonest.

As for insurance reporting, the PLM system can easily produce a complete report on the state of the network as it existed before the disaster, so a proper claim can be made.

By providing a complete, up-to-the-minute picture of the physical network, a PLM system offers peace of mind, rapid disaster recovery, and streamlined day-to-day operations in the data center or central office. It gives technicians an accurate roadmap to follow in configuring, changing, or rebuilding a network, and thus reduces downtime and its associated costs.

Rudy Musschebroeck is business development manager for TE Connectivity's (www.te.com) physical layer managed connectivity products in EMEA (Europe, Middle East, Africa).

More in Data Center