Will chaos engineering shake up business continuity?
- Published: Tuesday, 04 September 2018 19:37
Michael Herrera has written an interesting article looking at the emerging discipline of ‘chaos engineering’ (CE) and suggesting that it has the potential to ‘bring big changes to business continuity’.
Key points from the article include:
- Chaos Engineering originated at Netflix in 2011 and has since spread to other tech companies such as Google and Amazon. It now looks poised to make an impact at non-technology firms.
- The core idea is that as part of your testing strategy, you should deliberately cause problems in the production environment (rather than in a test environment), because this is the best method of determining and improving resiliency.
- According to the Chaos Engineering community’s home website Principles of Chaos, the ‘advanced principles’ for doing Chaos Engineering include: build a hypothesis around steady-state behavior, run experiments in production, automate experiments to run continuously, and minimize the blast radius, meaning, ensure that the fallout from experiments is minimized and contained.
“Obviously,” says Mr. Herrera “you would need to ask management for permission to purposely insert chaos into a healthy production environment. I suspect that for the time being, most managers will say, No, thanks. But trends from Silicon Valley have a way of spreading throughout the country. I think that eventually, mainstream organizations will be taking a closer look at chaos engineering as a way of validating their recovery plans. In my opinion, chaos engineering is the future of business resilience. It’s the one true way of finding out if you can recover for real.”