If we are going to measure security, what exactly are we measuring?
If you can’t measure it, you can’t manage it
– W. Edwards Deming
That is a great quote and one I have heard a lot over the years, however an interesting point about this is that it lacks context. What I mean by this is that if you look at Page 35 of The New Economics you will see that quote is flanked by some further advice, namely:
It is wrong to suppose that if you can’t measure it, you can’t manage it — a costly myth.
Deming did believe in the value of measuring things to improve their management, but he was also smart enough to know that many things simply cannot be measured and yet must still be managed.
Managing things that cannot be measured, and the absence of context, is as good a segue as any to the subject of metrics in information security.
A recurring question that arises surrounding the use and application of metrics in security is “What metrics should I use?”
I have spent enough time in security to have seen (and truth be told, probably generated) an awful lot of rather pointless reports in and around security. I think I’m ready to attempt to explain what I think is going on here, and why “What metrics should I use?” might be the wrong question — and instead, there should be more of a focus on the context provided with metrics in order to create useful and meaningful information about an organisation’s security.
A Typical (& faulty) Approach
Now here is a typical methodology we often see that leads to the acquisition of a security appliance or product of some sort by an organisation (e.g. a firewall, or an IDS/IPS).
- A security risk is identified
- The security risk is assessed
- A series of controls are prescribed to mitigate or reduce the risk, and one of those controls is some sort of security appliance / product
- Some sort of product selection process takes place
- Eventually a solution is implemented
Now we know (or we thought we knew) thanks to Dr Deming that in order to manage something, we first need to measure it, so we instinctively look at the inbuilt reporting on the chosen device (if we were a more mature organisation we might have thought about this at stage 4 and it may even have influenced our selection). We select some of the available metrics, and in less than ten minutes somehow electro-magically we have configured the device to email a large number of pie charts in a pdf to a manager at the end of every month.
However, the problem with the above chart is that this doesn’t actually mean anything — it has no context. The best way to demonstrate what I mean by context is to add a little.
Ok, that’s better, but in truth it’s still pretty meaningless, so let’s add some more.
Now we are beginning to end up with something meaningful. Immediately we can see that something appears to be going on in the world of cryptomalware and we can choose to react — the information provided now demonstrates a distinct trend and is clearly actionable.
But we can bring even more context into this. The points below provide some more suggestions for adding context, and will give you a feel for the importance of having as much context as possible to create meaningful metrics.
- What about the threats that do not get detected? Are there any estimations available (e.g. in Security Intelligence reports) on how many of those exist? (Donald Rumsfeld’s ‘Known Unknowns’)
- Can we add more historical data? More data means more reliable baselines, and the ability to spot seasonal changes
- Could we collect data from peer organisations for comparison (i.e. — do we see more or less of this issue than everyone else)?
- We have 4 categories, perhaps we need a line for threats that do not fall in these categories?
- What are the highest and lowest values we (and/or other companies in the industry) have ever seen for these threats?
- Do we have the ability to pivot on other data — for example would you want to know if 96% of those infections were attributed to one user?
The Impact of Data Visualisation
So now we have an understanding of context, what else should we consider?
Coming from the world of operational networking, I spent a lot of my time in a previous role getting visibility of very large carrier grade networks, and it was my job to maintain hundreds of gateway devices such as firewalls, proxy servers, VPN concentrators, spam filters, intrusion detection & prevention systems and all the infrastructure that supported them.
At that time if you were to ask me what metrics I would like to collect and measure, the answer was simple — I wanted everything possible.
If a device had a way to produce a reading for anything, I found a way to collect it and graph it using an array of open source tools such as SNMP, RRDTool and Cacti.
I created pages of graphs for device temperatures, memory usage, disk space, uptime, number of concurrent connections, network throughput, admin connections, failed logins etc.
The great thing about graphs is you can see anomalies very quickly — spikes, troughs, baselines, annual, seasonal and even hourly fluctuations give you insight. For example, gradual inclines or sudden flat-lines may mean more capacity is needed, whereas sharp cliffs typically mean something downstream is wrong.
Using these graphs, and a set of automated alerts I was able to predict problems that were well outside of my purview. For example, I often diagnosed failed A/C units in datacentres long before anyone else had raised alarms. I was able to detect when connected devices had been rebooted outside of change windows. I could even see when other devices had been compromised, because I could graph failed logon attempts for other devices in the nearby vicinity.
In the ten years or so since I was building out these graphs in Cacti, technologies for the creation of dash boarding, dynamic reporting, and automated alerting have come a long way, and it’s now easier than ever to collect data and produce very rich information — provided that you understand the importance of context, and you consider how actionable the information you produce will be to the end consumer.
While this write up has focused particularly on context with respect to technical security metrics, it is important to remember that security is mainly about people, so you should always consider the softer metrics that cannot simply be collected by things such as SNMP polling, or the parsing of syslogs, etc. For example — is there a way to measure the number of users completing security awareness training, and see if this correlates with the number of people clicking on phishing links?
Would you want to know for instance if the very people who had completed security awareness training were more likely to click on phishy emails?
The bottom line is, security metrics — whether technically focused or otherwise, are relatively meaningless without context. While metrics aim to measure something, it’s the context in which the measurements are given which provides valuable and actionable information that organisations can use to identify and prioritise their security spend.
Article by Eric Pinkerton, Regional Director, Hivint
Check out Security Colony’s Cyber Security Reporting Dashboards, a NIST CSF based dashboard with more metrics for your security program.