The recent technology crisis impacting the TSB bank highlights a significant risk that many organizations face sooner or later; that of project risk. Steve Dance looks at the ‘seven deadly sins’ of major IT investments and the role of the business continuity function in mitigating critical project risk.
The recent long-lived systems crisis at the TSB bank has been well and truly in the public domain and reached such a pitch that the CEO was hauled in front of the UK Government’s Treasury Select Committee to respond to questions about the crisis. When these very public IT issues occur, it’s usual for the familiar refrain of ‘they didn’t handle the crisis well’ to appear in specialist media. However, it’s difficult to see how a fiasco of this magnitude could be handled by a series of platitudes and promises – there was a real technical problem which needed to be sorted out. The bottom line here is that there has been an almighty screw up and although it could be partially mitigated by adroit PR it was still a very public fiasco and, good PR or not, there have been significant impacts to both TSB and its customers that can’t be mitigated by words alone.
The whole TSB situation reminded me of something that does not receive a huge amount of coverage in the business continuity ‘space’ – project risk management. Large IT projects (whether new developments, upgrades or migrations to new platforms) can, as we have seen, carry enormous operational and reputational risks. New systems, upgrades and migrations are among the riskiest of IT initiatives, because fallback options are often overlooked or seen as an unnecessary conservatism that add to the project timeline. Any project carries risk and it isn’t possible to completely remove risks. Nevertheless, it is possible to significantly reduce the level of uncertainty in critical IT projects by considering the ‘seven deadly sins’ of major IT investments and ensuring that adequate assurance is obtained that these risks have been assessed and mitigated as far as reasonably possible:
The seven deadly sins of major IT investments
1. Optimistic scheduling
Due diligence: on what basis are we accepting this project timeline as reasonable estimate of the resources required and the skills needed. What historical evidence do we have that assures us that the timeline is based on experience?
Contingency planning considerations: what will be the impact of a project over-run. How we will we handle this publicly and/or internally?
2. Capacity and performance assumptions
Due diligence: what evidence do we have that the proposed technical architecture can process the expected volume of data and can absorb periodic spikes in utilization?
Contingency planning considerations: what would be the impact of a major shortfall in performance and capacity? What would be our fallback position if the situation became unacceptable?
3. Supplier assertions
Due diligence: what assurances do we have that our suppliers have the skills, experience and capabilities that they claim to have? (Remember, suppliers want your business and in their eagerness to get the deal they may be inadvertently optimistic.)
Contingency planning considerations: what would be the impact of a supplier under-delivering on their obligations? Is a fallback position possible? if not, ensure due diligence is as thorough as possible.
4. Conversion integrity
Due diligence: has the data conversion been proven? This is particularly important when the data structures differ radically. Has an assurance process been put in place that will indicate that data has been migrated across correctly and will highlight potential problems. HAS IT BEEN TESTED?
Contingency planning considerations: what would be the impact of a failed or compromised data conversion? What would our fallback position be?
5. Integration capability
Due diligence: what evidence do we have that the new technology will integrate with existing platforms and architectures.
Contingency planning considerations: what are the most significant interfaces? What is the impact if they fail or perform sub-optimally. What workarounds might be available?
6. Resources and skills
Due diligence: do we have access to the required expertise to deliver this project. What assurances do we have of continuity of access to these skills?
Contingency planning considerations: if available resources prove insufficient, do we have an approach to ‘back-fill’ at short notice?
7. Design integrity and quality
Due diligence: what assurances do we have that there are no design flaws in applications and other developments supporting this project. Has unit testing, integration testing and operational proving focused on providing assurance that the design is fit for purpose and free from defects?
Contingency planning considerations: if a major design flaw became apparent during implementation or roll-out how would isolate it? What workarounds might be available?
Role of the business continuity function in mitigating critical project risk
Given the commercial ramifications of a major failure during the roll-out of a major IT project, there’s lot to be said for the business continuity function to be involved in high impact IT projects. In a mature organization the business continuity function will have access to impact assessments for the areas of the organization affected by the IT project and will be in a position to advise which activities need to formulate plans in the event that any of the above risks manifest themselves and cause major disruption to the business. Major IT projects have enormous potential to significantly both disrupt operational capabilities of an organization and damage the reputation of an organization and its senior management.
TSB: were the risks overlooked – or ignored?
The risks and mitigation approaches discussed above are not rocket science that only a few high priests of risk management would understand – they are basic and fundamental. It’s inconceivable that the need for these due diligence and testing activities was not considered at some level at TSB. It's possible that the real root cause may not be due to risks being overlooked but because risk management activities were prematurely curtailed or compromised. The decision to ‘water down’ risk management activities occurs when other factors have taken precedence over the original risk management strategy for the project. These decisions are often taken when:
- Previously agreed deadlines are put under pressure. The insistence on originally planned roll-out dates being adhered to regardless of issues arising during developmental stages often leads to risky short-cuts being taken;
- Budgetary pressures result in cutting out or prematurely curtailing assurance and testing activities to meet budgets that were based on early stage assumptions;
- Egos take priority over reality creating an unwillingness to backtrack on assertions made at early stages. This can make individuals willfully blind to bad news and, if they have sufficient authority, the bad news can be ignored or trivialised.
In essence, all of the above risk management activities are necessary, but not sufficient. All of this work can be undermined by a dogged determination to press on, regardless of risk.