Graph databases find new applications in cybersecurity boom

Spurred by massive demand and substantial investment, cybersecurity is currently experiencing a new generation of significant technological innovation.

First-generation cybersecurity products, such as heuristic-based antivirus and passive firewalls, occurred in the late 1980s through the early 1990s. The second generation of cybersecurity innovation began in the mid-1990s, with the introduction of more sophisticated technologies. Perimeter security appliances, in particular, saw marked improvements. For instance, firewalls became capable of stateful versus stateless packet inspection. Such innovations incrementally improved many categories of security technology through third, fourth and, in some cases, fifth generations that stretched well into the 2000s.

Some current innovation revolves around brand-new or emerging technologies, such as machine intelligence and threat intelligence platforms. Other areas involve finding new applications for older technologies. One technology experiencing a kind of renaissance is the graph database, which has existed for years but is finding new relevance in current cybersecurity contexts.

One expert at the forefront of graph database research and development is Dr. Steven Noel, a cybersecurity researcher in MITRE Corp.’s National Security Engineering Center. Noel and his colleagues have been working since 2013 on a graph database called CyGraph, which they have characterized as “a tool for cyber warfare analytics, visualization and knowledge management.”

Noel’s current work builds on cybersecurity research he began in 2001 at George Mason University. He characterized his early research as “a problem space that we called topological vulnerability analysis.”

Noel explained, “It was some of the earliest research in looking at how attackers can leverage vulnerabilities in multiple hosts to incrementally penetrate networks. The capabilities that we developed for mapping, analyzing and visualizing network vulnerability graphs were eventually transitioned from academia to the commercial sector as the Cauldron tool. An important lesson we learned from applying Cauldron to real enterprises is that each customer environment has its own blend of available data and analytic requirements, so that a flexible and extensible architecture for representing data relationships and expressing patterns of security concern is needed.”

Graph databases fundamentally differ from relational databases in a few key ways. For example, relational databases store data in rows and columns, which makes visualizing relationships between data abstract, if not impossible. By contrast, graph databases enable visual representation of the relationships between data points using nodes, edges and properties.

Nodes are individual data points, such as domains, devices, vulnerabilities and exploits. Edges show the relationships between nodes, such as communications links between servers, which exploits are targeting which vulnerabilities and which common vulnerabilities and exposures apply to software and hardware in a network environment. Properties describe attributes about nodes, such as a server’s Internet Protocol address, a host’s configuration or a firewall’s rules.

In an interview with Fifth Domain, Noel explained, “The underlying data model for relational databases is well suited to problems in which the relationships among data elements conform to fixed patterns. However, in many problem domains, such as cybersecurity, entities interact in complex and unpredictable ways. Graph databases map information by explicitly capturing webs of connections between data points. These interconnected webs provide rich sources for recognizing patterns of relationships that might otherwise be missed.”

The advent of the big data era is one factor driving research, development and innovation in graph databases. Cybersecurity professionals now have more data than ever to analyze in activities such as threat modeling, vulnerability assessment and risk management. But without supporting technologies, the data often lacks context and does not inherently suggest how to prioritize security activities. This data deluge — including high volumes of false negatives and false positives — as well as the difficulty of analyzing the data has led to a phenomenon among security analysts known as “alert fatigue.”

Another factor driving graph databases in cybersecurity is the need to show the interconnectedness of disparate data points to form a whole view of network environments, threats and vulnerabilities.

Noel has previously written, “The problem is not a lack of information, but rather the ability to assemble disparate pieces of information into an overall analytic picture for situational awareness.”

Real-time cyber situational awareness, through what Noel calls a “unified graph-based model,” is one principle underlying his research. Noel told Fifth Domain the goal is to “go beyond the representation of data elements as isolated entities, to capture their interrelationships as rich graphs. This enables a deeper level of analysis based on relationship patterns rather than isolated data attributes.”

The applications for graph databases range from identifying network infrastructure and cyber posture to analyzing mission dependencies and cyber threats — including the identification of multi-step, multi-phase cyberattacks.

Sophisticated nation-state threat actors and cybercriminals employing advanced persistent threats often target governments. Such cyberattacks usually involve a series of exploits to compromise a victim. It can be challenging for cybersecurity analysts, given alert fatigue and a high volume of poorly structured data, to recognize that multiple incidents, in isolation, actually represent a single hack employing many tools, techniques and procedures.

This is where the contextualization of threats and prioritization of vulnerabilities, which graph databases enable, become particularly valuable to cybersecurity analysts. Noel explained:

“A focus of our early research was to automate the activities of penetration testers, who enumerate vulnerabilities and incrementally penetrate networks to find new vantage points that expose new weaknesses. This involves analyzing vulnerability scan reports, network topology and access policy rules, such as routers and firewalls. The resulting comprehensive maps of potentially exploitable vulnerability paths provide the context needed for effective responses to actual threats. [It] explicitly captures potential and actual multi-step attacks as graph relationships. Also, by fusing data from multiple complementary sources, it provides more robust situational awareness that helps compensate for imperfect detection of stealthy adversaries.”

The growing field of threat intelligence is another application for graph databases. Noel has previously written, “the development of cybersecurity models and analytics has been hampered by a lack of information sharing.” Yet, governments and companies are increasingly, if reticently, exploring information sharing on threats and vulnerabilities, encouraged by cybersecurity experts and policymakers who believe increased collaboration will improve cybersecurity for everyone.

But even when information sharing exists, there are still challenges to making the data accessible, relevant and, ultimately, actionable within specific cybersecurity environments. Here, too, Noel sees opportunity.

“Bringing together all the relevant information into a common analytic environment, as in SIEMs [security information and event management tools], is the first important step,” Noel said. “But traditional security tools generally represent security information as isolated pieces of information, like the rows (events) and columns (event attributes) of a relational database.”

In contrast, graph databases use nodes, edges and properties to visually depict the interconnectedness of and patterns between data, Noel said, providing a “kind of complex ‘connecting the dots’ that is essential, but slow and tedious to do manually or through custom scripting.”

Graph databases are slowly improving capabilities in two other important areas of threat analysis and vulnerability assessment: detecting zero-day vulnerabilities and predictive cybersecurity.

As for detecting zero-day vulnerabilities, Noel said that graph databases often depend on data derived from other tools to support analysis and situational awareness, which can limit available information. While at George Mason University, he began work on a metric he called “k zero-day” to measure resistance to zero-day vulnerabilities.

Noel explained, “[K zero-day] employs the same kind of topological analysis as for known vulnerabilities, but applied to the distinct services across an enterprise. The metric then evaluates whether the enterprise — more specifically, particular mission-critical assets — can survive a given number, k, of zero-day attacks.”

As for predictive cybersecurity, Noel and his team are developing predictive analytic techniques derived from prior experience to develop queries that “encode decipher patterns that predict similar future outcomes.” Another area involves “link prediction, which seeks to model how the evolution of complex graphs can be predicted by properties of the graph itself.”

However, Noel cautioned of the inherent challenges to any method of prediction, including cybersecurity: “Mathematically, prediction involves inference, i.e., capturing properties of a problem space from observed instances. But remember, as Nobel laureate Niels Bohr observed: ‘Prediction is very difficult, especially if it’s about the future.’ ”

Recommended for you
Around The Web