Cloud resilience: a collaboration
- Published: Monday, 28 September 2020 07:45
This article arose out of conversations during the BCI’s Education Month, where a number of business continuity professionals decided to work together on a document exploring various aspects of cloud resilience. Continuity Central was approached to host the article so that it could reach a wider audience; and we are happy to do so.
Note: This article has been published unedited to maintain the crowd sourcing approach.
Public, private, hybrid, multi-cloud? Does anyone else feel like they need to know just a bit more about this and how it’s affecting your organisation and how it might influence your role in resilience now or in the future?
Following a really successful BCI Education Month webinar session, where Luke Bird hosted a coffee catch up with IBM's Flick March, Resilience Ninja Ken Simpson and The Failover Podcast Host Shane Mathew, it was decided that we would put out an easy to understand informal article on this topic for the professional community.
We will each collectively add our thoughts on this topic to help level-set the resilience community’s understanding on this new and emerging landscape.
Further contributions were added from those who attended the session or contributed to the discussion beforehand via linkedin. All credited of course as much as possible! This is a team thing!
- A collaborative list of questions about cloud and cloud resilience
- A collective list of useful sites to visit and things to read
Dear Reader - please go and contribute your knowledge to this shared appendix let’s make it even bigger and better! It’s only going to be open and public for a short period of time!
What is Cloud?
By Luke Bird
Someone once recommended to me Eli the Computer Guy on YouTube. He’s really good at explaining things at a high-level. I highly recommend that before you go any further. He does a session here on cloud and he also provides some useful notes here.
At around 12 minutes into his session he says if you haven’t got your head around the stack (infrastructure/technologies involved in the cloud service) you will run into problems. He’s actually talking here about the more technical elements of product/services but it works just as well for resilience practitioners – we need to be aware to a certain degree.
HighQ provides a decent basic overview of the 3 terms here. But I’ve summarised them below:
Private Cloud = the cloud services used by a single organization, which are not exposed to the public. A private cloud resides inside the organization and must be behind a firewall, so only the organization has access to it and can manage it.
Public Cloud = the cloud services are exposed to the public and can be used by anyone. An example of a public cloud is Amazon Web Services (AWS).
Hybrid Cloud = the cloud services can be distributed among public and private clouds, where sensitive applications are kept inside the organization’s network (by using a private cloud), whereas other services can be hosted outside the organization’s network (by using a public cloud).
2019 Gartner Report – What Leaders need to know about Cloud DR Shared
By Flick March
See Report - What I&O Leaders Need to Know About Disaster Recovery to the Cloud is a really useful overview of the current thinking at the moment across IT organisations. Overall the paper suggests that it’s cheaper but requires change. It will change the way you perform testing and it is likely that some of your applications will have too many limitations in the first place to even embark on the journey to cloud e.g. licencing costs etc.
451 Research / IBM Webinar - Hybrid Cloud Resilience
Shared by Chris Green
By Luke Bird
Whilst the shift towards cloud first has existed for some time there now appears to be a post-pandemic acceleration for transformation to support the new working environment and customer offerings.
Organisations are preferring to move towards a hybrid-cloud solution which enables their business to utilise a range of different platforms which get them closer to the end consumers e.g. customers, employers etc.
Other attractive prospects appear to be the access to other useful data, system relationship with partners and ease of access to existing infrastructure.
But aside from all that technical jargon … all you need to really know if there is a dramatic and accelerated change to how an organisations data and IT services are set up. The change is driving a rapid dispersion of infrastructure (IT stuff) across different platforms.
451 research reports an absence of up-to-date DR plans in place that would potentially cover this type of ongoing transformation (and the end state for that matter).
The Lay of the DR Land
The new landscape appears to be made up of a complex multi-cloud environment relying on technology from different suppliers with some organisations heading into it at such a pace that their disaster recovery plans will simply not keep up.
Areas of Risk (To a BC Practitioner)
From Shane Mathew
With the move towards Cloud technologies in more and more of business operations, you must also weigh the potential impacts that this brings- through the lens of resilience. Some issues that may require mitigation or planning strategies include:
- How you approach the idea of failover
- Your IT team’s role in a disruption response (as the applications and/or data you depend upon is housed elsewhere)
- Service Level Agreements for critical applications versus others
Additionally, one must also consider the impacts of network loss between your facility and the cloud provider. One scenario may be that your cloud provider is up, but your local area network provider is not. How would your team then address the impacts and response to a situation in which your entire connection has been lost?
Where to Start?
As with all things in life it’s probably best to just start somewhere regardless where you are in the journey. So, what are the very first few things you need to think about before you go any further?
A simple suggested plan of attack from FBCI Paul Kirvan
Paul Kirvan provides his 5 high-level suggestions on what first to consider:
- consider having a different cloud vendor as the failover resource,
- determine that the resources needed from the failover service are available;
- execute SLAs for all vendors;
- ensure that the failover resource can perform a failback if needed; and
- periodically test the failover (failback) resources to ensure they work as designed.
Luke Bird Comment - Really simple and effective advice there I like it! However, to enforce those measures one would probably need to be a paralegal at the ground floor of supplier contract negotiations i.e. a number of organisations will already be in agreements that limit their ability to achieve these 5 steps in a meaningful way.
From Cloud Risk Specialist Tayyab Choudhry
The way you probably want to breakdown thought process on this is:
A: Exit strategy/ plans for the components on cloud in a hybrid model. The challenge with this would be identifying an alternative Cloud Service Provider, having all technical requirements listed and in place to cover for deployment differences, and finally the biggest challenge would be testing this.. could take months.
B: Breakdown DR testing plans and failover requirements for the on-prem components would be slightly different from the components on cloud, in principle both should be similar for planning and testing - just the documentation and actual testing operationally and logistically will be different. The challenge in this case would be completeness of data in your inventories and systems of records. For components on cloud you can also rely on the testing that the provider performs (some do it quarterly).
Luke Bird Comment – The takeaway from this insight is two words complexity and visibility. Where are your data points to assure the providers are testing? Is this centralised for you in a single view / cross-provider? Does the requirement (and level of data) change per provider depending on their service agreement/risk appetite/ data quality ...good luck unifying that lot!
So what’s out there to help us? Suggested Assurance Questions from Ian Spence when considering a cloud service
- Here is a list of key risks. How are you mitigating them?
- What are your RTO and RPO? Does that apply to a cyber incident as well?
- How is the system being backed up?
- If the most recent backup is corrupt, when was the last good one taken?
- Do you integrity check your backups?
- Where is your data stored?
- Where is your log data stored?
- Are you contracting for a service or infrastructure or what?
- What are your service levels?
From Ken Simpson
The US National Institute of Standards and Technology (NIST) offers a range of freely available papers, reference models and tools that can be used to define, categorise and evaluate cloud offerings. NIST have also published a framework for managing risk in the cloud.
Working with a widely recognised model as a base, rather than models created by a single vendor, will make it easier for the practitioner to compare and contrast different vendors and cloud offerings.
By Shane Mathew
The use of cloud technologies today in business has grown and will continue to do so as companies realize the benefits and economies of scale these tools provide them. As we’ve discussed, these same technologies do not come without concerns or new risks to the business continuity efforts.
Luckily, the role of the Business Continuity practitioner is to find ways to reduce risk, even when the business adopts these newer capabilities.
The key to this however, is to become an educated and informed professional who understands how the cloud technology works, how it has been integrated into the critical functions and dependencies of your business, and how your risk identification and planning efforts must also reflect this new era.