Downtime prevention: the crucial role of monitoring in IT resilience

Published: Friday, 29 April 2016 09:31

Networks and Internet connections are the lifeblood of modern businesses. Downtime of these networks can have many effects and not just financial ones. Demotivated staff and non-responsive customer facing systems can potentially have a devastating effect on customer service and reputation. Two reports from Veeam and DevOps put the financial value of unplanned downtime at $1.25 - 2.5m (£875,000 - £1.75m) and the cost of infrastructure failure at $100,000 (£70,000) per hour for enterprises from a range of vertical sectors. This is a considerable sum, which in turn makes the reduction of downtime an obvious priority for IT professionals.

Infonetics undertook a survey in February 2016 where the most common causes of ICT downtime were identified as: failures of equipment, software and third-party services, power outages and human error.

Based on these survey findings there are obvious steps that businesses can use to avoid the crisis of downtime. Backup generators and UPS units can keep power up, the implementation of more redundancy protection, offsite backup solutions, better training for staff and the use of more cloud based software. However, these do not stop failures.  The most effective way to stop failures is to understand when and where they will occur and take measures to prevent them. Simple network monitoring allows you to view your network and the status of its devices, so you know when they are on or off line, but modern monitoring solutions allow you to extract and monitor far more than that.

Catch your issues before they become critical: pro-active v reactive

Network monitoring solutions can proactively detect issues before they escalate to cause outages. Even across multiple sites a centrally monitored solution allows IT teams to understand how their network is performing, the bottle necks, device load levels, software statuses and resource availability.

Monitoring supplies a 24/7 view of network resources. Centralized dashboards provide IT teams with a single view of their infrastructure and applications, allowing them to actually see when issues are occurring. Automated alerts via email and SMS warn your IT team when systems are getting critical, allowing them to react immediately and most importantly before they fail.

Furthermore, monitoring does not just supply a view of the infrastructure but can also provide a view of the environments surrounding your devices. A variety of sensors can be configured to work alongside your monitoring solutions: temperature monitoring can ensure that servers stay cool; humidity sensors ensure that there is not too much moisture in the air; static electricity and other sensors ensure that devices are not under threat from outside influences. You can even monitor who is in the server room.

Unfortunately, downtime can never be 100 percent unavoidable, however there are steps businesses can take to both minimize its occurrence and the impact it has on the business. By implementing contingency and redundancy the loss of a device has less impact on staff and clients. Awareness is also key. Implementing a pro-active monitoring system that alerts your teams is very cost effective (in some cases, even free) allowing them to react quickly to issues and resolve them before they have any serious impact.

The author

Lawrence Freeman is operations director at Mutiny.