Creating a new piece of malware is easy: Find something that suits your needs, then tweak the code to bypass firewalls and trick anti-virus software. Hackers are doing this everywhere but by analyzing the provenance of code — a broader look at style rather than just substance — defenders are disrupting the reuse cycle and making the adversary's life harder.
Thomas Ruoff, director of technology innovation and mission integration for the Department of Homeland Security, likened provenance to a professor trying to determine whether a student actually wrote a paper or whether it was plagiarized it.
By breaking down the code and comparing certain attributes to other forms of known malware that have been seen before, defenders can identify see patterns that tell them the signs to watch for. From there, through statistical analysis, they can assign a trust score to the traffic coming from certain sources or bearing certain signatures of malicious intent.
While this tactic isn't perfect — true zero-days pop up all the time — it is a fast and efficient tool DHS and others are using to identify and block a large swath of attacks, Ruoff said during a panel discussion at the 2016 RSA Conference in San Francisco.
"It enables us to rapidly make a probability [determination] of authorship, probability of intent," he said. Furthermore, "it also allows us to think about the reuse of code and how the actors readily reuse malware, change it a little bit — therefore change the hash — that approach is somewhat nullified if you have to change a tremendous amount of the object to make it not attributable … It means the economics and the sweat equity of the bad guy are changing."
DHS is working with the FBI, Justice Department, National Security Agency and others to pull in and analyze more code to build better provenance profiles. But, as the private-sector work in this area continues to grow, the government is also looking to industry to provide additional data.
Some of that will come through information sharing — the framework of which is just being established — while other data will be purchased directly from companies trading in threat intelligence, according to Phyllis Schneck, deputy undersecretary for cybersecurity and communications within the DHS National Protection and Programs Directorate (NPPD).
Using multiple sources of information enables analysts to build a more comprehensive trust score, adding in indicators and knowledge from all angles.
Even then, the system is far from perfect. But it is effective at identifying software at the far ends of the spectrum, Schneck said.
"Probabilities on this tend to be very reliable on the good end and on the really bad end and that's what we're most concerned with. The middle is a science project that we're looking at," she explained. "But it's the ability to clear some of the ridiculous noise out of the Internet; to make it less easy for the adversary so that we can start to hunt and find the really sophisticated stuff."
Schneck compared it to a bank robber making a slight costume change and then attempting to rob the same bank. He might not be the smartest robber out there, but he needs to be stopped and this is one way to do it.
"To truly cause pain to this adversary we have to cut the business model and make sure they stop reusing and just tweaking software and they get right back at us and get in," she said. Creating a robust provenance and attribution system gives agencies "the ability to absolutely destroy the model where the adversary simple takes a piece of software and sends it back out again."