Nearly all organizations today rely on information technology and the data it manages to operate. Keeping computers and networks running, and data accessible, is imperative. Without this information technology, customers cannot be serviced, orders taken, transactions completed, patients treated, and on and on.
Disasters that create IT downtime are numerous and common, spanning the physical and logical, the man-made and natural. Organizations must be resilient to these disasters, and able to operate in a disruption of any type, whether it is a security incident, human error, device failure, or power failure.
State of Preparedness
Most organizations know the importance of disaster recovery, and firms of all sizes are investing to drive greater uptime. An IDC study on business continuity and disaster recovery (DR) showed that unplanned events of most concern were power, telecom, and data center failures (physical infrastructure) – more so than natural events such as fire or weather. Security was considered the second most critical and extreme threat to business resiliency.
Seventy-one percent of those surveyed had as many as 10 hours of unplanned downtime over a 12-month period. This underscores the importance of greater uptime and DR, which is driving firms to conduct DR tests more frequently. Approximately one in four firms are conducting DR testing quarterly or monthly, while another 45% are testing semi-annually or annually.
This is a marked increase from previous research, which IDC conducted three years ago, where firms were testing annually at best. However, 25% of firms are still not doing any DR testing.
DR planning is complex and spans three key areas: technology, people, and process. From an IT perspective, planning starts with a business impact analysis (BIA) by application/workload. Natural tiers or stages of DR begin at phase 1 – infrastructure (networking, AD, DHCP, etc.) – then extend to recovery by application tiers. Each application tier should have an established recovery time objective (RTO) and recovery point objective (RPO) based on business risk.
DR testing is essential to adequate recovery of systems and data, but also to uncover events or conditions met during real disasters scenarios that were not previously accounted for. Examples include change management such as the needed reconfiguration of applications or systems. Also, the recovery of systems in the right sequence is important. To ensure that DR testing, planning, and recovery is organized and effective, many organizations use a disaster recovery “run book.”
A DR run book is a working document, unique to every organization, which outlines the necessary steps to recover from a disaster or service interruption. It provides an instruction set for personnel in the event of a disaster, including both infrastructure and process information. Run books, or updates to run books, are the outputs of every DR test.
However, a run book is only useful if it is up-to-date. If documented properly, it can take the confusion and uncertainty out the recovery environment which, during an actual disaster, is often in a state of panic. Using the run book can make the difference for an organization between two extremes: being prepared for an unexpected event and efficiently recovering, or never recovering at all.