Deciding whether to failover to a secondary site or wait it out and fix the problem in-house still remains one of the toughest decisions businesses face during an outage. This is according to Oscar Arean, technical operations manager from disaster recovery service provider Databarracks, who explains more below:
Recently, the New York Stock Exchange (NYSE) was forced to suspend trading for three hours following a major technical glitch. The decision to cease trading rather than failover to its Chicago recovery centre is one that has created much debate. This is a situation many organizations still struggle with when they suffer an outage:
Business continuity and disaster recovery plans will specify the exact length of an outage before an organization should invoke its failover but as we all know, during a real-life disaster, these timings can slip as you try to fire-fight.
Defining the point at which to failover is individual to each organization and it will differ depending on the type of disaster being dealt with. You may have a set response for dealing with issues relating to storage but something completely different for dealing with network related issues or a natural disaster, all of which are fine. But this doesn’t mean that you should be making these decisions at the time of an incident – your point of failover should be defined before this.
Your crisis management team will identify the most likely disaster scenarios and there should be plans in place for each of these. If the organization has decided that the maximum outage it will allow is four hours, and it actually takes one hour to recover its systems, then it is crucial to begin the recovery process before the three-hour mark. Failure to do this could have a detrimental impact on the organization in terms of cost and reputational damage.
Once you have these plans in place, it’s imperative they are adhered to. An organization will have worked out how long it can be out of action for before it makes more financial sense to invoke DR and move to its recovery site, so when that point is reached, action must be taken. It’s tempting to extend the time by an extra hour because your team is close to fixing the issue, but this can easily escalate. By going through practice scenarios with the crisis management team, you should feel more familiar with them, making it an easier decision to make on the day.
Additionally, it is also worth identifying the types of scenarios when it might be deemed unnecessary to failover, as was the case of the NYSE.
A lot of organizations will have very comprehensive traditional disaster recovery plans in place but are likely to only ever invoke these for very significant outages lasting several days. For those organizations, invoking disaster recovery is such a significant task, consuming so much time and resource, that dealing with the IT incident is considered the lesser of two evils, even if it takes days to resolve the issue. Those are the organizations that are investigating more flexible alternatives made available to them through cloud computing.
Disaster recovery as a service (DRaaS) helps to bridge the gap by providing a more flexible and cost-effective alternative to traditional, cumbersome DR solutions. Organizations who do adopt DRaaS find that their DR plans are now equipped to deal with far more incidents and are less reticent to invoke DR.