In my other Blog (Disaster Recovery - Risk Assessment and Value of Information), the following was discussed:
- The value of information as a form of asset used to ensure continuity and how to establish this value.
What happens when the preventative measures have failed? How do we ensure continuity? This is the time when we are faced with a worst case scenario. If we are unprepared for such an eventuality, the risks of failure are greatly increased. To be prepared, we need a PROVEN contingency plan. Many will argue that:
- The chances (risks) of worst case scenarios (disastrous events) are small and far between.
- The costs of establishing and maintaining contingency plans are high and unproductive.
The above arguments are valid, disasters don't occur every day. But, should we take a chance? Based on pure statistics, maybe we should. On the other hand, why do we insure our houses against disasters such as fires? The risks seem to be fairly low. The bottom line is that it is up to the Executive Management to decide after answering the right questions and careful assessment of the answers obtained. The answers can sometimes be frightening.
A strategy addressing business continuity must take into account all aspects of a business, all threats to business survival and resulting contingency plans. Threats can originate in different areas such as; finance, the market place, the competition, physical disasters etc... Here, we are only going to address contingency plans for computerised information systems (DRP) which is an important, if not crucial, part of a business continuity strategic plan. The rest of this article discusses a methodical approach to Disaster Recovery Planning (DRP).
We should stop here and try to define what is meant by a disaster recovery plan. The best definition I have come across is the following:
"A disaster recovery plan (DRP) is a comprehensive and consistent statement of actions, tasks, dependencies and milestones, along with resources required to accomplish a required level of recovery for given functions at given locations, within given time frames".
The above definition should be the goal of any effective DRP. From the definition, we can surmise that the process of establishing and maintaining an effective plan can be complex. We therefore have to get organized and use the correct method(s) to achieve the goal.
The adoption of methods is only the beginning of the process. We still have to prove that the methods will achieve the goal. We can only do that by adopting practical solutions which can be proven to work. In this context, I think we should recognize the following:
- A formal DRP can be produced quickly based on theoretical assumptions and expert consultations. While presenting a logical/methodical solution and giving a warm feeling ("WE HAVE A PLAN"), such a plan is only worth the paper it is written on.
- An untested recovery plan is a prescription for failure. Such a plan is complex and must be thoroughly tested.
- A formal documented plan that is effective is the END RESULT of a process that adopts practical and tested (proven) solutions.
- A DRP is a "LIVING" system which must always be up to date and react to changes in circumstances and environment.
- The DRP involves multiple resources & disciplines and as such must be developed and maintained by a number of people organised into teams and expertise.
- The teams should participate in the development of plans as well as the recovery process to ensure effectiveness. The teams should be responsible for developing the required tasks, assigning responsibilities, ensuring proper timing and availability of resources in the plan.
- The process of producing the plan is highly interactive and is managed, coordinated and integrated by the DRP Coordinator.
- The plan is tested as far as possible to guarantee that it will work. The plan is also reviewed and tested regularly or as a result of changes to ensure effectiveness.
- The teams are always aware of the DRP and maintain it on an ongoing basis. The cycle of keeping an effective DRP never stops.
The overall method of developing and proving a DRP must be logical and practical. The method must answer the needs, be cost effective and provide the vehicle for success. As opposed to other systems geared to supporting the business functions, a DRP is not going to improve the profit margin or improve productivity. It involves added costs and human resources from which direct and tangible benefits might never be realised. It is, however, a key component of the overall strategy for protecting assets and ensuring business continuity and survival. An overall practical method that works follows:
- Conduct a risk analysis. This process:
- Identifies threats and vulnerabilities.
- Identifies what is at risk and potential impact.
- Recommends alternative solutions to minimise risks.
- Identifies what functions must be recovered and when.
- Removes the uncertainty of theoretical assumptions and focuses on the problem at hand.
- Leads to more effective management decisions.
- Implement the following recovery strategy:
- Identify main tasks to be carried out in the recovery process.
- Identify the MINIMUM needs (resources) to recover functions.
- Establish how much of the needs and tasks can be satisfied with existing resources and what extra resources are required.
- Establish costs of acquiring extra resources to satisfy needs.
- Obtain approval for the costs. Acquire extra resources.
- Develop test scenarios and test tasks using defined resources.
- Identify problems, resolve them and test again. Document proven solutions to incorporate in total plan.
- At this stage, working and proven solutions exist to allow recovery.
- Document the formal recovery plan in detail.
- Walk through the plan and refine.
- Test the total plan by simulating disasters.
- Ensure that the formal review procedures, testing policy and schedules are documented in the plan and are regularly followed.
- At this stage, a formally documented and fully tested plan exists which can be effectively maintained.
The outlined method is logical and should not be too difficult to follow. However there are complexities which must be taken into account:
- The persons involved in developing and maintaining an effective DRP are primarily involved in attending to the daily business of the organization. This is their primary function and the DRP will always take a lower priority.
- An established DRP is essentially a dormant system and tends to be forgotten until tests and reviews are carried out or when an incident requires recovery of functions.
- The time span between major reviews and associated tests are fairly long (e.g. three months). During these periods, changes in personnel, environments and functions often take place. Different persons are involved and have to be informed and trained. Changes in systems, procedures and physical resources often threaten the effectiveness of the plan which can be quickly out of date.
- An effective DRP requires the integration of tasks, timing and resources (e.g. human resources, hardware, software, supplies, vendors, communications, off-site data, technical and user manuals, business procedures and technical procedures). This integration can be fairly complex to manage and maintain.
- The human expertise might not be available when an incident requires that the DRP be invoked. Tasks and procedures might be allocated to less knowledgeable persons.
These complexities can be addressed by doing the following:
- Responsibilities must be clearly defined and documented. The necessary commitment must be obtained from Management and individuals. The persons involved must be convinced that their involvement is valuable and contributes to a successful DRP.
- Effective change control must be established. Whenever changes are planned or occur (e.g. changes in personnel, hardware, procedures, systems and functions) potential impact on the DRP must be established and the necessary actions taken to ensure continued effectiveness. Preplanning is very important.
- The necessary tools must be evaluated, acquired and used to integrate the tasks, timing and resources. The correct tools will ensure the proper standards, facilitate the updating of the plan, ensure effective review, structure any required training and enforce a common approach among different disciplines.
- The documentation of procedures and tasks must be detailed. As the expertise might not be available in a disaster, people with less expertise must be able to carry out the functions. Tests should not be carried out by the experts in their field but left to others with less expertise to prove that the procedures and documentation are as foolproof as possible.
The implementation of a DRP and methods used will differ between organizations. While the same basic principles will generally apply, some problems and issues will be very specific to an organization. There are no complete sets of packaged methods and tools available that will resolve all problems. Implementing a DRP is a learning process for any organization.
One must not make the mistake of throwing pre-packaged solutions at the DRP and think that this will guarantee success. The essential ingredients to adopting the correct methods in any organization are:
- Careful study of the risk factors and requirements.
- Assessment of the needs and adoption of practical and cost effective solutions to address the needs.
- Effective project definition and management.
- Commitment from Management and individuals.
- Proof of all assumptions and solutions by testing and review.
- Good, detailed documentation.
- Effective change control and formal review procedures.
There are some good tools incorporating documented and manual or automated methods on the market. Such tools can be very valuable in the integration, standardisation, documentation and review process of the DRP. They must be carefully assessed against the problems and complexities to be addressed before being purchased and used.
To conclude, we should recognise that disaster recovery planning is not an exact science or discipline. The success of any DRP will depend on: commitment from the organization, careful planning, effective project management, adoption of practical solutions, effective documentation, accumulated experience and good continuity management.
My next article (http://disaster-recovery-developplan.blogspot.com/) deals with the development of the plan and its components.
Copyright José Masson All rights reserved. Copying and publishing the content without prior written permission is prohibited.
