IT disaster recovery, cloud computing and information security news

Barry Shteiman looks at the role of machine learning in cyber security and explains that, while it is not a ‘silver bullet’, it has a distinct role when augmented with human intelligence.

Cyber criminals are launching attacks that are far more complex than those security teams faced a decade ago. Long gone are the days when straightforward attacks such as SQL injections were the only worries for security analysts. Cyber criminals are now deploying advanced attacks that may take place over months. These span across the entire business network and utilise multiple employee account credentials. For security analysts today, it’s just as important to understand the normal behaviour of all business employees as it is to identify suspicious activity.

Piecing user behaviour together

Despite what you might read, companies have long attempted to build a picture of normal user behaviour to help uncover threats. Most companies have thrown people at the problem – employing more incident response (IR) analysts to sift through data and make judgements about events.

These incident responders normally start by looking at domain controller records to see who was using a particular IP address at the time of an incident. Next, they might search through logs to see what the IP did before and after. If they’re experienced, they might notice a connection between the user’s workstation and a remote server, using unrelated account details. After considerable time and effort, the IR analyst should have created something resembling a timeline of all the events around the time of the incident.

Understanding whether user behaviours are normal from this timeline is another challenge altogether. Routine activities for a database administrator might be very unusual for an employee in the HR team. For the IR analyst to determine if the user’s activities are normal, they will likely run many more searches and queries on historical data before putting the findings in a reporting system to identify any trends that indicate potential risk. It is safe to say, this process can take days or weeks.  

Automation helps analysts speed up this process. It takes various forms, but the typical use cases are scripts that automate data collection and signatures to detect certain types of attacks. In more recent years, we’ve started to see event correlation being used to help uncover well-defined, network-based attacks. The common example for this is an employee logging on from home over the VPN, who also just badged into the building. Event correlation can notify an analyst that those two events shouldn’t be happening at once.

How do you analyse all the data?

The existing model for security analytics and intelligence has not kept pace with the threat landscape. Its effectiveness has been questionable. Why? The biggest factor without doubt is data.

The volume of data that could be used in a security investigation has been growing so quickly that it is now not uncommon for a large business to collect as much as 300 terabytes of data every day. This data deluge comes from systems generating more data and SIEMs collecting data sets that simply didn’t exist before (e.g. IoT devices). This means it’s often too expensive to store enough historical data to effectively support a security investigation. In many cases, only 30 days’ worth is kept at any time. The thinking behind this is that if any more is kept, the sheer volume could overwhelm the reporting system.

The IR analysts facing this sea of data are also likely to be overwhelmed and miss important trends. One answer to the challenge might once again be to throw more bodies at the problem, but the simple truth is most companies, even the big ones, don’t have the funds to hire enough expert threat hunters. And that’s presuming that this threat hunting expertise even exists at such scale – it doesn’t.

At the same time, businesses are in a much greater state of flux, with employees being replaced by outsourced capabilities or temporary workers that turn over on a more regular basis. This makes it harder to identify who is an actual employee or user, let alone build a clear picture of that user’s normal activity. In short, IR teams are simply unable to process enough useful security data to understand whether or not there is an imminent threat.

Automation can help, not fix

When security tasks were simple and static, IR analysts could rely on automated machines to help streamline tasks. This worked well when there wasn’t too much data, when the data was a common format, when the threat techniques didn’t change too often and when attacks were solely network-focused. Needless to say, those days are firmly behind us. But, while the threat landscape has become more challenging, thankfully, the machines have become a lot smarter.

Recent developments in AI and machine learning have been met with jubilation and a fair amount of hype within the industry. The problem with the emergence of these add-on technologies is that vendors have been very lax with their descriptions. They’ve created confusion in the market. When customers hear a vendor urging them to ‘pour data’ into their machine learning based analytics engine, customers expect wonderful things to simply pop out the other end. In reality it doesn’t work like that. Too many organizations believe that machine learning and AI is a magic bullet.

That’s not to deny the usefulness of these technologies. Understanding normal behaviour is one area where developments in artificial intelligence and machine learning can be applied with great success. There are now algorithms that can create context by connecting events into coherent user sessions. The combination of algorithms and statistical analysis can answer a huge range of questions incredibly quickly: is this a real user or a service account? Is this person an admin? Does this activity deviate from this user’s peer group’s activity? Is the user of account A also logged in under account B?

Help analysts with automated priorities

Putting the pieces together, the best way to cope with the huge data volumes and more complex threats is to augment, not replace, human intelligence with machine intelligence. A good machine-based analytics system should continually ingest new data, understand any alterations in a user’s normal behaviour, stitch individual activities into timelines and then analyse the timelines to see if there are any risky behaviours. These tasks could take an IR analyst a week or more per user.

The analyst could review a user session the machine created to more quickly notice a deviation from that user’s normal behaviour. Taking it one step further, a machine could automatically score anomalies and assign a points system to each and every user. This will help reduce false positives and alert fatigue.

Organizations can’t replace security teams with machine automation, nor can they afford to reduce the amount of time spent on incident analysis. Cyber criminals are not going to stop attacking anytime soon. What Machine learning tools can do is make it easier to respond to actual cyber threats, providing security analysts with more focussed information, helping them make better decisions in less time. It’s a balance, but one that every organization needs to strike.

The author

Barry Shteiman is Director of Threat Research at Exabeam.

Want news and features emailed to you?

Signup to our free newsletters and never miss a story.

A website you can trust

The entire Continuity Central website is scanned daily by Sucuri to ensure that no malware exists within the site. This means that you can browse with complete confidence.

Business continuity?

Business continuity can be defined as 'the processes, procedures, decisions and activities to ensure that an organization can continue to function through an operational interruption'. Read more about the basics of business continuity here.

Get the latest news and information sent to you by email

Continuity Central provides a number of free newsletters which are distributed by email. To subscribe click here.