Effectively Operating in Recovery Mode

Avalution Team Avalution Team | Oct 22, 2007

glen's perspectiveThe goal of business continuity is to re-establish critical business processes in a timeframe and at a level that will sustain the business after a disaster. To establish a program that is able to operate effectively in recovery mode, an organization must develop recovery strategies and plans that satisfy the requirements determined during a Business Impact Analysis (BIA), take into account the human and technological constraints inherent to their business model, rigorously challenge all assumptions made during the planning process, and validate the recovery process through ongoing exercises.

The data collected during a BIA provide the basis and requirements for the development of an effective business continuity strategy.  Beyond the basics of recovery time and recovery point objectives, an effective BIA will identify key process flows, interdependencies between the departments that contribute to the flows, human resource needs, communications with third parties, and the tool sets required to perform necessary business functions for an extended period. To operate effectively in recovery mode, strategies need to ensure that all elements of critical processes are coordinated, both in terms of recovery times, methods and duration.  A common problem is process disconnects that often occur in organizations that decentralize business continuity planning and allow independent strategy development by business unit or location, without coordinated oversight.

Another common problem occurs when organizations attempt to replicate all aspects of normal day-to-day operations, adding unnecessary complexity and resource requirements to the recovery process. Every effort should be made to eliminate less critical activities and focus on as small a set of activities as possible. Plans should also have a mechanism to identify processes, or parts of processes, that are critical at the time of disaster so that limited resources can be applied most effectively.

Other common problems stem mainly from assumptions made during the planning process. Several of the most common that can have serious negative consequences to recovery operations are:

  • Assuming that staff assigned to recovery teams will be willing and able to relocate to a recovery site that requires overnight travel. Many companies sign contracts for work area recovery sites several hours away from their primary sites because it is the nearest location offered by a commercial provider.  Unfortunately, experience has shown that many employees are unable or unwilling to relocate. Planners should work with their Human Resources departments to understand the demographics of the workforce and potential impacts to employees’ families before committing to distant recovery locations.
  • Assuming that remote access technology will allow staff to work from home.  Many companies designed remote access capacities to accommodate a relatively small percentage of users. Also, in the case of companies utilizing VPN technology, access tokens and specialized software may be required to allow access.  Before allowing remote access as a strategy option, a company must understand the limits of its production technology and the costs of equipping staff for recovery mode access.
  • Assuming effective manual work-around procedures exist.  As automation has increased, many companies have lost the ability to operate manually.  Also, productivity increases achieved through automation make operation with a reduced workforce in recovery mode highly problematic.  Before assuming manual procedures are acceptable, a company must ensure that the knowledge, tools, and capacity exist to make it a viable option for the short to medium term.
  • Assuming the “best case scenario” with regard to the delivery of equipment, restoration of systems, and technical support requirements at the recovery site.  Many plans underestimate the technical requirements of establishing the recovery site, assume that a quick-ship agreement that guarantees equipment on site in forty-eight hours means that the business will meet a two day recovery time objective, and underestimate the workload and stress placed on technical resources.

The final element to ensure that your organization can operate in recovery mode is exercising the plan.  In general there are two types of exercises an organization can conduct to test business continuity capabilities: tabletop exercises and simulated activation exercises. Tabletop exercises serve best to validate process and organizational plan elements and simulated activations serve best to validate logistics plans, timeline assumptions, and operational support requirements. Simulated activation exercises are particularly useful in highlighting the human element in recovery, an aspect often overlooked during plan development.

Overall, strategies and plans need to be developed that identify and coordinate all aspects of critical processes, accommodate the restrictions of both the workforce and available technology, and accept the reality of some level of technology dependence.  Ongoing exercises need to be designed to verify the process elements and highlight changes, as well as validate the mechanics of the recovery strategy and the limits of the human element in recovery mode.