The Risk Report: Understanding how to measure risk means knowing where to look

I can recall when the end of the world was measured in magnitudes of megatons and in multiples of total global destruction. Remember MAD - mutually assured destruction (http://en.wikipedia.org/wiki/Mutual_assured_destruction)? The problem with measuring calamitous risk is that anything greater than 0 is effectively a total disaster. Therefore, we conditioned ourselves to look at risk as a binary value, and we qualified it in multiples of horror, hence the counts of the missiles in the silos.

Secretary of Defense Robert McNamara at the Cabinet Room, White House, Washington, DC, November 22, 1967 Image

"It is important to understand that assured destruction is the very essence of the whole deterrence concept. We must possess an actual assured-destruction capability, and that capability also must be credible. The point is that a potential aggressor must believe that our assured-destruction capability is in fact actual, and that our will to use it in retaliation to an attack is in fact unwavering. The conclusion, then, is clear: if the United States is to deter a nuclear attack in itself or its allies, it must possess an actual and a credible assured-destruction capability. " Mutual Deterrence" Speech by Sec. of Defense Robert McNamara, 1967. (http://www.atomicarchive.com/Docs/Deterrence/Deterrence.shtml)

For a time, the United States didn't really measure threat as we can now, since it was a binary value. The risk was total destruction, and the mitigation was the threat of our ability to return the favor. During the It worked for decades to deter the real threat by being able to inflict damage greater than any enemy attack could actually cause. Our defense was primarily offense. In the game of Intercontinental Ballistic Missiles, only a few can play. The parties involved are well funded, physically established with infrastructure, labor force, scientific and research capabilities, manufacturing, heavy industry and supporting military and political infrastructure. It is a big statement to swing the nuclear hammer, so only a few can back that up. We knew where we had to look, so we only had to watch one viable threat, and for a short time, two if you count Cuba.

In This Issue

The Security Content Automation Protocol (SCAP) is a concerted effort to apply systems and methodologies whereby threat management can be automated and driven primarily by computing systems. A component of SCAP is Common Vulnerability Scoring System (CVSS). The CVSS provides a way to apply a relative score to any vulnerability in such a way that all items can be compared one to another.

In consecutive issues, we will review the composite pieces of SCAP, and start to understand how to automate the process of risk management.

We will start to look at the impact of risk, how to measure risk and it's associated costs.

Introduction

Automated vulnerability management assumes automated and prioritized mitigation of issues. A significant component of mitigation is the understanding that not everything can and will be mitigated do to business, budget, priorities and so on. Considering this, understanding what vulnerabilities are helps to understand what an acceptable level of risk is, and the ongoing need to mitigate to those acceptable levels of risk.

We will review risk in a new way. Risk is an element of business, but I will show that it is a predictable, known element to which a cost and schedule can be associated. Technology risk and the associated mitigation efforts are identical to changing oil in an otherwise fine vehicle prior to any other indication that anything is wrong.

Risk is improperly associated with something bad, something to be avoided and feared. Risk is the indirect result of the increasing functionality and complexity layered into less and less electronic systems, and an increasing amount of volume of transactions handled by those systems.

Therefore, the amount of risk over time can be express in dollars as (conceptually)-

risk = transactions * ((internal functions * all library dependencies per transaction) + (external functions * all internal and third party library dependencies per transaction) * (cost to business in dollars per loss of transaction)

A single bit of functionality, like an autoupdate, or a spellchecker, can add millions of potential points of operational complexity, each of which could introduce architectural weaknesses. Architecture and careful design and testing can constrain some of this risk, but think of the evolution of software as a parallel to the evolution of a paper glider to a 747. With the increase in functionality comes complexity, interdependencies, external dependencies on third party components and systems, and an implied responsibility to manage and maintain these increasing complexities in order to maintain functionality.

The concept of risk

Risk is a measure of probability of a disruptive event, a measure of the scope of impact such event has on an affected system, and the relevance and significance of that system.

For individuals, risk is the probability of impact, and scope of such impact, against operating technology systems - email, internet, local programs, etc.

For business users, risk is a measure of the potential threat posed against any service level agreement, or SLA, to deliver the operational cabability of technology systems to business users within contractual constraints. In business, technology services are typically contracted from IT, guaranteed by certain SLAs, or assurances of performance and availability. Technology risks would be measured against their impact against contracted performance requirements.

Risk awareness and management is the effort of quantifying the impact of technology threats on systems and services, and applying mitigation in a manner prioritized against the most relevant threats at any time.

Acceptable losses, and the realities of risk

From the documentary - "The Fog of War" by Errol Morris

McNamara: I was on the island of Guam in his command in March of 1945. In that single night, we burned to death 100,000 Japanese civilians in Tokyo: men, women, and children.
Morris: Were you aware this was going to happen?
McNamara: Well, I was part of a mechanism that in a sense recommended it. I analyzed bombing operations, and how to make them more efficient. i.e. Not more efficient in the sense of killing more, but more efficient in weakening the adversary. I wrote one report analyzing the efficiency of the B—29 operations. The B—29 could get above the fighter aircraft and above the air defense, so the loss rate would be much less. The problem was the accuracy was also much less. Now I don't want to suggest that it was my report that led to, I'll call it, the firebombing. It isn't that I'm trying to absolve myself of blame. I don't want to suggest that it was I who put in LeMay's mind that his operations were totally inefficient and had to be drastically changed. But, anyhow, that's what he did. He took the B—29s down to 5,000 feet and he decided to bomb with firebombs.

Curtis LeMay had command of bombers over Japan. Robert McNamara tabulated the return of high altitude bombing against the successful completion of bombing objectives, and helped LeMay determine that low altitude strategic bombing would have higher overall performance even weighed against higher overall losses. The change in strategy to low altitude night bombing resulted in a higher overall return against losses.

The bombings were not stopped because of the losses. Instead, the performance of the bombers was increased against the same losses. Understanding the reality of risks and their impact on system performance helps to prioritize issues for resolution.

The parts of cyber risk

What -

A list of standardized names for vulnerabilities and other information security exposures, CVE aims to standardize the names for all publicly known vulnerabilities and security exposures. It is a dictionary, NOT a database. The goal of CVE is to make it easier to share data across separate vulnerability databases and security tools. While CVE may make it easier to search for information in other databases, CVE should not be considered as a vulnerability database on its own merit.

The National Vulnerability Database (http://nvd.nist.gov) provides a compilation of vulnerabilities in applications, operating systems, hardware and infrastructure. The NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard. The NVD is a database of CVE names and related information, but there is no hierarchy in the information, or inheritance, dependency, or association of information.

A vulnerability is a threat that has been identified, documented, tested and reported, typically to the NVD.

Age -

Vulnerabilities are time sensitive. Zero day issues are given great importance for a practical reason. (http://en.wikipedia.org/wiki/Zero_day_attack) Something that is newly reported is "zero day". The zero day is the point at which an issue goes from being unknown to known, and is perceived to escalate the threat, since a wider audience has the capability to attempt to exploit a newly discovered issue. Computer attackers can easily use well known and documented software exploits on core platforms to attempt access to systems. Age is important, since the longer an issue is publicly known, the greater the possibility for an attack. Think of this as a part of the measure of probability of exploit.

Relevance -

Understanding threats mean understanding all the threats at a given time. During any time, there are a number of threats that are known. This means that to adequately quantify any one risk to another, all risks must be compared to all others, and to the inventory of systems under management. As an example, if there are no Apple computers within a network, all Apple related issues can be excluded.

Remember, this is a measure of the relevance, or relative importance, given all factors, of any one issue at a given time to all other issues at the same time. This is critical for understanding current workload relative to risk mitigation and prioritizing it.

Risk-

CVSS is a significant component of CVE information provided by NVD.

CVSS was developed by and is maintained by First.org. http://www.first.org/cvss/cvss-guide.html

The Common Vulnerability Scoring System (CVSS) provides an open framework for communicating the characteristics and impacts of IT vulnerabilities. CVSS consists of 3 groups: Base, Temporal and Environmental. Each group produces a numeric score ranging from 0 to 10, and a Vector, a compressed textual representation that reflects the values used to derive the score. The Base group represents the intrinsic qualities of a vulnerability. The Temporal group reflects the characteristics of a vulnerability that change over time. The Environmental group represents the characteristics of a vulnerability that are unique to any user's environment. CVSS enables IT managers, vulnerability bulletin providers, security vendors, application vendors and researchers to all benefit by adopting this common language of scoring IT vulnerabilities.

Each individual issue is "weighted" for its own ability to cause damage and disruption to operations. By adding details specific to systems under management, CVSS allows a qualifier, the environmental score, that scores the importance of the underlying system and their criticality to operating systems. This provides a distinct score for each issue.

Assets -

Understanding the assets under management is important. Without an accurate and current inventory of software, hardware, infrastructure, and software libraries contained within all of those, it is hard to accurately respond to documented threats.

SCAP provides a framework to electronically describe technology assets and related security information. Tools exist to automate and manage inventory using SCAP tools. http://nvd.nist.gov/scapproducts.cfm

Proprietary systems exist to do endpoint and network scanning, using file system, protocol, and SNMP to identify files, applications, operating systems, hardware, and infrastructure components. An accurate and current inventory is critical to managing associated risks.

Work Effort -

Managing network components, threats included, is an indirect measure of professional work effort. It takes specialists to manage and mitigate issues. Professionals provide services to organizations at a cost for their time. Once risk management is measured properly, this ongoing effort, measured in cost, can be used to associate a cost to manage risk to acceptable levels.

The workload index (http://nvd.nist.gov/home.cfm?workloadindex) has a greater effect if it is recalculated, customized for the reported issues applicable to a managed network. As a note, WLI anticipates that the workload is directly impacted by issues reported over the prior 30 days. This assumes that work is prioritized by the number and criticality of issues as soon as they are reported.

If the WLI is adjusted to a defined inventory of technology assets, then the WLI more accurately represents the effort required to mitigate these issues on an ongoing basis.

The formula for the WLI is meant to identify a current inventory of issues needed to be resolved. The WLI provides a running average of a work effort, and such work effort needs to be adjusted by two things - first, if the WLI is applied to the inventory of technology assets within an organization, then it becomes more relevant.

Second, the workload needs to be adjusted to the number of affected systems -

The Applied workload index is calculated using the following equation:

((number of high severity vulnerabilities published within the last 30 days * number of affected systems) +
(number of medium severity vulnerabilities published within the last 30 days/5 * number of affect systems) +
(number of low severity vulnerabilities published within the last 30 days/20 * number of affected systems)) / 30

Counting Cannon Balls

Image

By changing WLI to be a metric by quick anyone managing vulnerabilities can quantify the operational work effort required to mitigate, WLI helps us to understand risk imposed by vulnerabilities as an operational cost, like electricity, rather than something to fear and avoid.

Many security management tools provide endless lists of issues. NVD currently has 40837 vulnerabilities as of 2/28/10, and countless repositories provide lists of thousands of issues. Issues are reported over time, specific to operating systems, applications, releases and patch levels, hardware and specific infrastructure. A count of issues alone provides a lot of noise, but nothing to quantify risk for an operations manager. A count of vulnerabilities is no more relevant to managing the function of an operating network than having a count of solar flares to date. The result of the formula will generate a composite count of issues, varying in severity, as a measure of the number of new issues that have to be mitigated within a specific computing environment on a daily basis.

Attacks are current, risk is predictive and abstract

An exploit differs from a vulnerability in that something is happening. A vulnerability represents a series of conditional situations, any of which could allow systems to be compromised, if other conditions are met. Therefore, attacks are simple in that they represent no need for predictive or speculative planning. Attacks represent our best ability to respond to immediate issues here and now. I am intentionally oversimplifying an attack to make the differentiation between action and reaction, not that all attackers are squabbling monkeys that can be detected making brute force attacks on our systems.

Vulnerabilities are not attacks. They are not actually operational problems. A vulnerability is a door that can be opened. A breach indicates that it has been opened. Vulnerabilities represent weaknesses in a managed environment, and impose risk because of the possibility of loss of service, actual and reputational losses. Managing vulnerabilities is the oil change of technology. Vulnerabilities are mitigated to reduce operational risk.

Image

http://www.pcworld.com/article/131378/contest_winner_vista_more_secure_than_mac_os.html

Attacks are built around a few things - known vulnerabilities increase the likelihood of finding weaknesses that have not been mitigated. Secondly, an organized attack needs to occupy security resources at the target site. An attacker has an understanding of the resources required to deal with an issue. A good attack will attempt to consume resources with fake attacks, or multiple misdirection, while sending in the real attack. A small team of attackers inadvertently, or knowingly, can exploit the fact that many issues are not resolved in a timely manner after revealed to the public, and there is a limited group of people available to respond to an issue at any time.

As another issue, an attack may be subtle, having no clear signs that attract resources and attention. The defense against this is monitoring and management, and effective reduction of exposure to known issues by application of mitigation policy as cited herein. Having systems mitigated for current known vulnerabilities reduces the exposures in the event of an issue, and focuses the limited resources to those areas that may be exploitable.

Therefore, attacks in reality are complex, elusive, pervasive, and challenging to identify from normal activity, even when they are going on. Many attacks may not have a defined start or end. It is our responsibility to reduce the probability of these attacks by reducing the potential points of access to our systems, herein expressed as vulnerabilities.

Image

Count by ratio

If you understand the Applied WLI, and have provided IT management with an appropriate budget (reviewed next) and staff to meet the average daily obligations for vulnerability management, you can establish "standards", targets of vulnerability maintenance against which the team can be measured.

Total systems - In this term, I use system to refer to the lowest common denominator for management, a machine, or virtual machine. This machine can host applications, operating systems, it is hardware or infrastructure. The machine, or system, would individually need to be maintained to mitigate vulnerabilities.

Total Applied WLI - This number represents the average daily vulnerabilities on systems under management.

Remediation Standard - This is the amount of unresolved issues allowed, inclusive of the daily average. It might be reasonable to set this as a multiple of the Applied WLI, so that the Remediation Standard is a measure in days times the Applied WLI, such as:

60 * Applied WLI, gives a 60 day review window for open issues.

Understanding that applications depend on prioritization and resolution scheduling

For risk to be managed, issues must be prioritized based on a compilation of information for each issue.

If you measure risk properly, you can budget it

Airius Risk Score= (vulnerability * (((CVSS environmental) * daily adjustment) * age) * relative risk score) = Overall Risk Score

Cost of Risk Management = Adjusted WLI * hrs per issue (where hrs per issue should be adjusted to reflect local skill levels, environment, and special conditions, CVSS environmental score per issue, and mitigation technologies available per platform, and overhead cost - benefits, rent, management, electricity, plus the total hrs to mitigate to represent the actual opportunity cost) * loaded cost per hour

IMPORTANT - one user may be able to mitigate a critical issue across a thousand machines in one hours, while another may spend a year.
Loaded Cost - includes the complete loaded cost of the staff required to mitigate the outstanding questions, plus the opportunity cost lost to issue mitigation

This provides a total in daily hrs required to mitigate average risk workload across affected systems. While the hours per issue can be adjusted, this provides a point from which to start. Now, there is a daily average in hours to manage software vulnerabilities.

The Airius Risk Report database provides a risk score, and calculates the associated cost. By understanding the cost of risk mitigation, and by applying an overall score to manage the technology, vulnerabilities are prioritized to be resolved based on their overall risk score relative to all other issues at any given time. A risk score takes into account the individual vulnerability, the affected systems, the business impact and criticality, the relevancy to any other issues at the same time, and the time that the issue has been public.

Note that the CVSS is calculated daily, along with the risk score. By doing this, even if an issue is increasing in risk daily, it may decrease in importance to the business, thereby reducing its overall Risk Score

If risk mitigation is scheduled based on the risk score, rather than regular patch schedule, software vulnerabilities can be mitigated as part of an ongoing process, decoupled from IT management processes. Management of vulnerabilities has to have a priority over scheduled operations management issues, allowing the mitigation of risk to always have priority over standard network management issues.

Time is the greatest defense to technology vulnerabilities. Reduction of time by prioritizing and resolving issues will most effectively mitigate issues. Risk is not effectively measured with a count of known vulnerabilities, since that is an incomplete and irrelevant metric.

Probability - A measure of the chance of a system affected with a vulnerability being exploited

Scope - The impact of exploit on the affected system

Relevance - The importance of the vulnerable system on the overall operation of the computing environment, the cost to the business of an exploit of that specific vulnerability

The important thing is to understand the impact of known issues on a managed inventory, and prioritize the resolution of issues as part of normal systems management. Finally, by being able to equate vulnerabilities to an ongoing cost of systems management, it becomes easier to budget and understand. Additionally, the negative effect of budget cuts on infrastructure management staff can directly be seen in the lack of resources available to manage the current balance of issues, herein counted as part of the Adjusted WLI.

Vulnerability management is as significant a part of systems operations as electricity. Properly quantifying risk, priority and associated costs is important to understand and effectively manage a risk strategy.

War Room from Dr. Strangelove http://en.wikipedia.org/wiki/Dr_strangelove

Coming next time

Next issue, we will review the CVSS Environmental, and understand how to make better use of an otherwise imprecise measure of the scope and criticality of an issue. We will also understand how to add the concept of probability into the prioritization of mitigation of vulnerabilities within a technology environment. Additionally, we will discuss the way to reliably view risk as an operational cost item that can be added to overall operational costs and weighed against revenue. Once we cover the basic components of automated risk management, we will provide access to tools that help to manage the integration of SCAP to mitigate vulnerabilities within a computing environment.

References:

http://nvd.nist.gov/

http://scap.nist.gov/

http://www.first.org/cvss/

https://buildsecurityin.us-cert.gov/bsi/home.html

https://buildsecurityin.us-cert.gov/swa/presentations_09/March%20Forum%20Day%203%20-%20addtl%20file.pdf

The Risk Report

Saturday, May 22, 2010

Understanding how to measure risk means knowing where to look

No comments: