Generative AI is pervading just about every industry already, whether we like it or not, and cybersecurity is no exception. The possibility of AI-accelerated malware development and autonomous attacks should alarm any sysadmin even at this early stage. Wraithwatch is a new security outfit that aims to fight fire with fire, deploying good AI to fight the bad ones.
The image of righteous AI agents battling against evil ones in cyberspace is already pretty romanticized, so let’s be clear from the outset that it’s not a Matrix-style melee. This is about software automation enabling malicious actors the same way it enables the rest of us.
Employees at SpaceX and Anduril until just a few months ago, Nik Seetharaman, Grace Clemente, and Carlos Más witnessed firsthand the storm of threats every company with something valuable to hide (think aerospace, defense, finance) is subject to at all hours.
“This has been going on for 30-plus years, and LLMs are only going to make it worse,” said Seetharaman. “There’s not enough dialogue about the implications of generative AI on the offensive side of the landscape.”
A simple version of the threat model is a variation on a normal software development process. A developer working on an ordinary project might do one part of the code personally, then tell an AI copilot to use that code as a guide to make a similar function in five other languages. And if it doesn’t work, the system can iterate until it does, or even create variants to see if one performs better or is more easily audited. Useful, but not a miracle. Someone’s still responsible for that code.
But think about a malware developer. They can use the same process to create multiple versions of a piece of malicious software in a few minutes, shielding them from the surface-level “brittle” detection methods that search for package sizes, common libraries, and other telltale signs of a piece of malware or its creator.
“It’s trivial for a foreign power to point a worm at a LLM and say ‘hey, mutate yourself into a thousand versions,’ and then launch all 1,000 at once. In our testing, there are uncensored open source models that are happy to take your malware and mutate them in any direction you wish,” explained Seetharaman. “The bad guys are out there, and they don’t care about alignment — you yourself have to force the LLMs to explore the dark side, and map those to how you’ll actually defend if it happens.”
A reactive industry
The platform Wraithwatch is building, and hopes to have operational commercially next year, has more in common with war games than traditional cybersecurity operations, which tend to be “fundamentally reactive” to threats others have detected, they said. The speed and variety of attacks may soon overwhelm the largely manual and human-driven cybersecurity response policies most companies use.
As the company writes in a blog post:
New vulnerabilities and attack techniques — a weekly occurrence — are difficult to understand and mitigate, requiring in-depth analysis in order to comprehend underlying attack mechanics and manually translate that understanding into appropriate defensive strategies.
“Part of the challenge for cyber teams is, we wake up in the morning and learn about a zero day [the name given to security vulnerabilities where the vendor has no advance notice to fix them] — but by the time we are reading about it, there are already blogs about the new variation that it has mutated to,” said Clemente. “And if you’re at SpaceX or Anduril or the U.S. government, you’re getting some fresh custom version made just for you. We can’t rely on waiting until someone else gets hit.”
Though these custom attacks are largely human-made now, like the defenses against them, we have already seen the beginnings of generative cyberthreats in things like WormGPT. That one may have been rudimentary, but it’s a question of when, not if, improved models are brought to bear on the problem.
Más noted that current LLMs have limitations in their capabilities and alignment. But security researchers have already demonstrated how mainstream code-generation APIs like OpenAI’s can be tricked into aiding a malicious actor, as well as the above-mentioned open models that can be run without alignment restrictions (evading “Sorry, I can’t create malware”-type responses).
“If you start getting creative with how you use an API, you can get a response that you might not expect,” Más said. But it’s about more than just coding. “One of the ways in which agencies detect or suspect who is behind an attack is they have signatures: the attacks they use, the binaries they use… imagine a world where you can have an LLM generate signatures like that. You click a bot and you have a brand new APT [advanced persistent threat, e.g. a state-sponsored hacking outfit].”
It’s even possible, Seetharaman said, that the new agent-type AIs trained to interact with multiple software platforms and APIs as if they’re human users, could be spun up to act as semi-autonomous threats to attack persistently and in coordination. If your cybersecurity team is prepared to counter this level of constant attack, it is likely only a matter of time before there’s a breach.
So what’s the solution? Basically, a cybersecurity platform that leverages AI to tailor its detection and countermeasures to what an offensive AI is likely to throw at it.
“We were very deliberate about being a security company that does AI, and not an AI company that does security. We’ve been on the other side of the keyboard, and we saw until the last few days [at their respective companies] the kind of attacks they were throwing at us. We know the lengths they will go to,” said Clemente.
And while a company like Meta or SpaceX may have top-tier security experts on site, not every company can stand up a team like that (think a 10-person subcontractor for an aerospace prime), and at any rate the tools they’re working with might not be up to the task. The entire system of reporting, responding, and disclosing may be challenged by malicious actors empowered by LLMs.
“We’ve seen every cybersecurity tool on the planet and they are all lacking in some way. we want to sit as a command and control layer on top of those tools, tie a thread through them, and transform what needs transforming,” Seetharaman said.
By using the same methods as attackers would in a sandboxed environment, Wraithwatch can characterize and predict the types of variations and attacks that LLM-infused malware could deploy, or so they hope. The ability of AI models to spot signal in noise is potentially useful in setting up layers of perception and autonomy that can detect and possibly even respond to threats without human intervention — not to say that it’s all automated, but the system could prepare to block a hundred likely variants of a new attack, for instance, as quickly as its admins want to run out patches to the original.
“The vision is that there’s a world where when you wake up wondering if you’ve already been breached, but Wraithwatch is already simulating these attacks in the thousands and saying here are the changes you need to make, and automating those changes as far as possible,” said Clemente.
Though the small team is “several thousand lines of code” into the project, it’s still early days. Part of the pitch, however, is that as certain as it is that malicious actors are exploring this technology, big corporations and nation-states are likely to be as well — or at the very least, it is healthy to assume this rather than the opposite. A small, agile startup comprising veterans of companies under serious threat, armed with a pile of VC money, could very well leapfrog the competition, being unfettered with the usual corporate baggage.
The $8 million seed round was led by Founders Fund, with participation by XYZ Capital and Human Capital. The aim is to put it to work as fast as possible, since at this point it is fair to consider it a race. “Since we come from companies with aggressive timelines, the goal is to have a resilient MVP with most features deployed to our design partners in Q1 of next year,” with a wider commercial product coming by the end of 2024, Seetharaman said.
It may all seem a little over the top, talking about AI agents laying siege to US secrets in a secret war in cyberspace, and we’re still a ways off from that particular airport thriller blurb. But an ounce of preparation is worth a hell of a lot of cure, especially when things are as unpredictable and fast-moving as they are in the world of AI. Let’s hope that the problems Wraithwatch and others warn of are at least a few years off — but in the meantime, it’s clear that investors think those with secrets to protect will want to take preventative action.