Designing your online infrastructure to survive major outages: where to start

Published: Thursday, 10 March 2022 10:02

Many businesses are reliant on a continuous online presence but may not have yet fully considered designing their platform to be able to respond to the effects of downtime - whether it is caused by an outage, cyber attack, or natural disaster. Terry Storrar provides some advice for those starting out on their disaster recovery journey.

The most effective way to protect against the danger of downtime is to design a platform that enables the business to react to failures caused by external factors that it has no control over. These external factors could range from the loss of a virtual server, underlying hardware failure, or even a total data centre / center outage.

Given the risks, it is important that organizations put in place an effective (and tested) disaster recovery (DR) strategy as part of their business continuity plan. A robust DR strategy is focused on addressing two primary concerns: restoring IT infrastructure as fast as possible and preventing as much data loss as possible from the point at which disaster strikes.

An effective strategy should address possible failures, design mitigations, and remediation. It is important to ensure that the strategy is not too complex because complexity increases the risk of unforeseen failures when it comes to running a failover switch from the standard infrastructure to the DR infrastructure.

Instigating high availability IT infrastructure failovers in the data centre and replicating application servers and data in real-time to an alternative data centre location with available hosting infrastructure can deliver the needed workload protection, as well as enabling a faster relaunch of mission-critical functions and services. 

It is vital not to put all your eggs in one basket and rely on a single data centre to handle all critical workloads. One solution is to replicate servers and databases to separate server infrastructure, ideally to a secondary data centre site entirely independent of the primary data centre site. But to work effectively, the server or data replication should be near real-time so that all changes inside the server are replicated, ensuring there is no danger of losing valuable changes in case of an incident.

There are various tools available to make this possible, from local server replication software to SAN storage replication. It’s also worth noting that the best way to keep replicated databases consistent is not to rely on server or storage replication but to use the provided database tools to keep databases consistent and in sync.

Avoid single points of failure

No matter how well you design your platform to be fully redundant by using different data centres or availability zones, it can still be vulnerable to other single points of failure, such as the network. If the network across and between both data centres is the same and that network faces a major issue, your environment is likely to still be affected in both data centres. This could also affect the failover of your public IP addresses between the primary and secondary sites.

To prevent this prospect, it makes sense to use a secondary data centre with a network independent of your hosting provider. If your current hosting provider cannot provide an independent secondary data centre, find a separate hosting provider for the secondary DR-site.

While this ensures the network is no longer a single point of failure, it presents a new challenge. If you set up a network-independent DR platform across separate providers, public IPs received from the hosting provider cannot be used in the DR environment hosted by a different provider.

Thankfully, there is a simple way to overcome this challenge: change the relevant DNS entries (A-records) to include the DR IP addresses. Once this is accomplished, the changed DNS records will be propagated across the Internet in the time needed to switch over to the DR environment and access to the platform will be restored.

For those businesses without the resources, time or inclination to design and implement a DR platform on their own, infrastructure-as-a-service (IaaS) providers are available with disaster-recovery-as-a-service (DRaaS) capabilities that can deliver the reliability and recovery needed to prevent a disruption to operations.

IaaS providers can help minimise the potential for data loss with multiple data centres across different geographies that maintain continuity of service even if one region experiences downtime, an outage, cyberattack or a disaster. IaaS-hosted data centres can also include multiple layers of security to limit access to data, protect against physical attacks and keep servers safe from intruders. 

Designing and implementing a DR platform doesn’t need to be complex, nor does it require extensive network knowledge. By frequently testing your DR platform and ensuring you are not solely reliant on a single provider, you can be more confident that your organization’s online presence will have greater resilience against downtime, outages, disasters and cyberattacks.

The author

Terry Storrar, Managing Director, Leaseweb UK