Making the first minutes of a major IT incident count
- Published: Thursday, 03 September 2015 08:17
The way that organizations communicate in the first few moments of a service outage is crucial, no matter the size of the problem…
By Teon Rosandic.
“Software is eating the world” said cofounder of Netscape, entrepreneur and software engineer Marc Andreessen. This observation has even more truth today than when it was written in 2011. For instance: the temperature sensors malfunctions on the freezers in the supermarket and it has to throw away thousands of pounds worth of food; or the tills stop working at Starbucks and the queue backs up into the street.
The way that organizations communicate in the first few moments of a service outage is crucial, no matter the size of the problem. How does an enterprise make sure each and every customer interaction is successful or even adds customer value? First, you really have to watch and measure everything and anything. By having sensors all around will enable you to intelligently gather the information and communicate even more effectively. Employees and customers have their own unique needs, and requirements are connected to your service, product and business processes from anywhere at any time. Software, specifically cloud software, makes this a reality.
The ‘Internet of Everything’ helps business move at such an extreme pace. However, manual processes, diversified IT infrastructures and dispersed workforces can complicate these communications, increasing downtime and impacting the business. Businesses can become incredibly helpless when one of the connected devices, relied on processes or the sensors stops working.
However, it’s bound to happen and the reason that we automate processes is so we don’t have to constantly monitor. Automation is one less thing to worry about – until it becomes one more thing to worry about!
The first five minutes…
How an IT company communicates to its employees, customers or partners during the first few minutes of a service outage is critical: businesses can be negatively impacted by even an IT outage that lasts just a few minutes.
A recent survey of more than 300 IT professionals by Dimensional Research reveals that finding the right person to restore service takes at least 15 minutes. Whilst IT looks for the right individual, the business is often suffering. However, it doesn’t have to be that way.
Here are four ways to reduce business downtime and improve the customer interaction significantly:
Implement a major incident plan: Finding a major incident manager to rectify critical issues can take 20 minutes, but it really shouldn’t take more than one to two. By having the contact information for incident resolvers automated into your processes and, for those you can’t automate, you can implement a full process that everyone knows and follows.
Let an incident manager handle it: Events change very quickly during an incident and every minute counts. Without a trained, experienced and level-headed professional making the key decisions, the incident resolution team can act like headless chickens. The leader is able to cherry-pick the right people to resolve any issues quickly.
Assemble a resolution team using the right tools: Assembling a team with a spreadsheet or Instant Messenger can take at least an hour. This isn’t the best use of time and can easily be automated through intelligent communication systems. These systems can automatically target and alert the right person needed to produce a resolution and rectify the issue or disruption. If that person doesn’t answer their phone or message, the system will automate escalation to another person with the required skills to resolve the issue.
Be transparent: If everyone is clear with their communications, the major incident manager can designate someone other than resolvers to proactively communicate what has happened and outline the next steps to customers, partners, marketing and public relations teams. The distraction of having to provide updates to customers whilst working to restore service can often lead to longer delays and errors. Communication transparency allows resolvers to focus on the task to get the business up and running as soon as possible.
Keeping a major IT incident quiet isn’t possible anymore: everyone finds out; and missing service-level agreements (SLA) sucks: no one wins. Be intelligent with your communication software and your communication processes so when a technology-related business issue happens, your business is able to stay ahead of the game.
Teon Rosandic is VP, EMEA, xMatters, inc.