Gaining Context Awareness for Better Remediation Prioritization

In our previous blog post, we discussed the vulnerability management industry’s crippling problems. It’s inability to prioritize by context and it’s stubborn focus on chasing hype and predicting (mostly) irrelevant and naive threat metrics, in otherwise very impressive dashboards.

We then presented our general strategy for solving this problem. We talked about how we aggregate different sources of information to provide a sufficiently clear representation of the organization’s context. This representation then enables a customized and efficient remediation strategy.

Specific machine learning predictions are combined in a single dimensional ordering of priorities

This approach combines specific machine learning algorithms in a simple yet critical way which results in an easily interpretable and actionable ordering of priorities. In this post, we discuss context inference techniques on a more advanced and technical level using two real-life examples. The next blogpost will be a sneak peek into how we are able to greatly reduce the number of false positive detections for even greater, more meaningful filtering.

The Setting

For the sake of clarity, let’s apply this concept to two typical examples. After letting Warden run its assessment on a network, suppose that it finds a bunch of software vulnerabilities and various misconfigurations that need to be addressed. Or, in the case of  two separate servers, with two separate subnetworks where both are running a vulnerable MySQL server, we will need to examine their respective contexts to best understand how we can properly compare the true severity of these MySQL vulnerabilities.

Learning the General from the Specific

Following the order of analysis done by Warden, we go from the specific to the general. It is important to remember that the following inference techniques only represent a small subset of the possible sources of information that are addressed by Warden when generating its prioritization metrics.

We start by analyzing individual vulnerability properties and their direct embedding — the underlying asset. There, we can measure items such as business impact and acquire a primitive notion of the isolated risk.

Once we have a grasp on every specific vulnerability-asset combination, we can move on to the broader network frame of reference, where the most likely targets and probable paths of attack are revealed. This step provides us with an increased understanding of the risk.

The next logical step is to leverage the multitude of network environments, business contexts and remediation behaviors that Warden has isolated to provide targeted, yet informed, remediation advice. This step is crucial to proper impact analysis, but it is seldom done properly.

A Matter of Scale

We fully rank thousands of vulnerabilities using context prediction to generate a true priority ordering.               

One thing to keep in mind is that in the end, what we want is a single dimensional ordering of vulnerabilities (from low priority to top priority). In order to accomplish this, we input a base score to every standalone vulnerability. For CVE-based vulnerabilities, we use a NIST equivalent baseline CVSSv2 score. For dynamically tested web vulnerabilities (DAST) or when CVSSv2 is not available, we estimate an equivalent based on a blend of in-house expertise, industry knowledge and our very own AI system.

Each element of the context analysis then successively pushes up or down this naive score by an amount calculated by the AI engine. This number is based on a variety of statistical factors, including relative distribution amongst other vulnerabilities and assets, the reliability of the detection methods and external intelligence feeds.

Case 1 – The Chain of Exploitation

Let’s say that Warden finds a MySQL server, version 4.1.10, on an otherwise updated Linux machine. This software is vulnerable to a “_mild_” authenticated remote code execution dubbed “CVE-2005-0709”. The industry standard CVSSv2 score is 4.6, making it a medium severity. Warden will then go further and analyze its context to push it up or down the elementary baseline.

  1. The asset is a regularly updated Linux machine, on a private network → Down.
  2. There is a publicly known exploitation technique for this vulnerability → Up
  3. We found an Apache http daemon serving a Website and inferred, by crawling its content, that it is serving a development version of a website → Up
  4. This same website is found crippled with web SQL injections, theoretically allowing an unauthenticated user to execute code. Of course we scan all web servers for dynamic web vulnerabilities (DAST) as well as traditional VA. → Up for all of these vulnerabilities
  5. On the same subnetwork range, we found a handful of other machines that all seem to be web development workstations, making this subnet a fertile ground for misconfiguration and leaky service. → Up for all of these assets.
  6. We infer some relative business value for this asset as many members of the organization are querying Warden to check for fresh vulnerabilities on a regular basis. → Up
  7. We also found that one of these development website versions is in fact the upstream for a public and official company website, hosted on a server that is in this subnetwork. This is a key finding as it could allow an attacker to pivot in the internal network. → Up.
  8. By aggregating many of the features above, we apply our collected history of remediations via a machine learning algorithm to predict that a mature organization would typically correct this situation in a very short time, pushing the score even higher.→ Up
The score is pushed up or down by a succession of assessment steps

Case 2 – The Loner

This example is also a misconfigured MySQL server vulnerability, but it is a much nastier one that allows for a privileged RCE. It was recently discovered that this version of MySQL exposes an easily exploitable vulnerability (CVE-2016-6662/4) that can lead to the execution of privileged commands on a compromised host. This is why it was given a critical (10.0 CVSS v2) severity rating. Let’s follow the same steps as above to infer its context.

  1. The asset is an updated Windows machine that follows automated updates for critical software → Down
  2. There is a known exploit available on the internet, it is even Metasploit ready. → Up
  3. The vulnerability was discovered through the use of an authenticated scan, it turns out the service is not exposed to the network. → Down
  4. The box does not seem to host any websites or services accessible from the network, making it an unlikely target → Down
  5. The box does not seem to be communicating with other machines on the network on a regular basis. → Down
  6. The asset is on a small VLAN with only few other updated machines. → Down
  7. No asset on this subnetwork is accessible from outside the network. → Down
  8. Even though the majority of assets are regularly scanned for this organization, this one has voluntarily been unscheduled for regular scans. → Down
  9. The vulnerability has been viewed once in Warden, and then left unattended. → Down
This vulnerability ends up being less important than the first one

This vulnerability, which initially seemed more “critical” than the first one, is in fact not that urgent. It does not appear to threaten business continuity in any way. There is even a chance it was intentionally left there for testing purposes. This is a prime example of how important it is to take multiple levels of abstraction into account when doing contextual assessments.

Additionally, note that a priority assessment run on these vulnerabilities has also been done, in parallel, on all the other assets/vulnerability combinations that they might impact. This network effect is crucial for a realistic and effective assessment strategy at scale.

Relative score distributions before and after the assessment. The final score is renormalized for graphical consistency.

The net effect of these assessments is that many vulnerabilities, which initially seemed critical when viewed in isolation, ended up being less important with respect to their value and what it represents to the business versus being a genuine risk. On the other hand, there are a few seemingly benign vulnerabilities which end up at the top of the scale.


It is important to remember that during a typical continuous network vulnerability assessment, we see hundreds (if not thousands) of these types of vulnerabilities. As we demonstrated, by failing to put vulnerabilities into context we are clearly missing the granularity required to properly assess their relative severity. This situation happens all too frequently, as most IT teams are far too overwhelmed by the complexity of their tasks.

Why are we saying that prioritization is for everyone? Simply because security is everyone’s concern, and that everyone should be able to have a clear and actionable security process. So, how can we make this common practice? We start with efficient prioritization strategy and continue with tailored expertise, and the continuous monitoring of everything in your internal and external digital footprint.

However, that solution is only possible in one of two contexts. You either have a team of seasoned security experts, or you use contextualized predictive prioritization with a constant, self-improving vulnerability management toolset.

In the next article, we will present an in depth look at three novel ways to include machine learning algorithms that drastically improve the priority assessment process.