Last week, millions of New York Times readers were subjected to ​an alarming column​ by Thomas Friedman. “Normally right now I would be writing about the geopolitical implications of the war with Iran,” Friedman begins, before soon continuing, “but I want to interrupt that thought to highlight a stunning advance in artificial intelligence — one that arrived sooner than expected and that will have equally profound geopolitical implications.”

The “stunning advance” was the release of Anthropic’s new LLM, named Claude Mythos. In a lengthy ​press release​, Anthropic announced that the model would be made available to a consortium of business partners, but not to the general public. To justify this decision, Anthropic cited their concerns about its effectiveness at finding security vulnerabilities in source code, noting: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.”

They go on to explain that Mythos “has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

This announcement clearly rattled Friedman, who called Anthropic’s decision not to release the model a “terrifying warning sign,” writing:

“Holy cow! Superintelligent A.I. is arriving faster than anticipated, at least in this area…If this A.I. tool were, indeed, to become widely available, it would mean the ability to hack any major infrastructure system — a hard and expensive effort that was once essentially the province only of private-sector experts and intelligence organizations — will be available to every criminal actor, terrorist organization and country, no matter how small.”

Friedman was far from alone in this concern. Many major news outlets expressed similar unease about this scary new development, including ​one particularly anxiety-provoking headline​ that asked if Mythos was an “AI nightmare waiting to happen?”

So, what’s really going on here?

I thought it was worth taking a moment to look closer, not just to address the specific worries about Mythos, but also to help recalibrate, more generally, how those of us seeking depth in a distracted world should consume AI news.

~~~

When I talked to people who were spooked by Friedman’s column, they tended to be under the impression that this ability to find and exploit security vulnerabilities was a new phenomenon; a skill that emerged unexpectedly in Mythos, “terrifying” those who studied it.

In reality, security researchers have been worried about using LLMs for this purpose since the beginning of consumer LLMs.

Back in 2024, for example, IBM researchers published ​a splashy study​ about using GPT-4 to attack security vulnerabilities. They found that GPT-4 successfully exploited 87% of the vulnerabilities that it was presented, as compared to close to 0% for GPT 3.5. “Our findings raise questions around the widespread deployment of highly capable LLM agents,” they concluded.

To be fair, in the case of GPT-4, researchers were assessing whether an LLM could write code to exploit a known vulnerability. Mythos, however, can also find these vulnerabilities from scratch. But this isn’t new either.

Accompanying the release notes for Anthropic’s earlier Opus 4.6 LLM was ​the observation​ that Anthropic’s security team used the model to find “over 500 exploitable 0-day [vulnerabilities], some of which are decades old.” This is almost word-for-word what Anthropic said last week about Mythos, the main difference being that they replaced 500 with “thousands.”

We are not, therefore, talking about a new capability, but rather one that has been around for multiple years.

The relevant question then becomes, how much better is Mythos at finding vulnerabilities? It’s hard to tell for sure because Anthropic has kept their new model private. They did, however, release that Mythos scored 83.1% on a well-known cybersecurity benchmark. For comparison, Opus 4.6 scored 66.6% on this same test.

In general, benchmark results should be taken with a grain of salt as they represent specific (often narrow) tests that researchers can tune their models to pass. But even if we accept that this particular measure is useful, a sixteen percentage point increase seems to represent solid incremental progress more than a nightmarish leap.

When we turn our attention to actual results, the waters become even murkier. In a recent Substack post (​which is worth reading​), Gary Marcus rounds up responses from security researchers who took a closer look at the specific exploits that Anthropic reported that Mythos discovered. They were not impressed.

  • Philo Groves, for example, ​noted​ that Mythos’s attention-grabbing attack on the Firefox browser required certain common security features to be disabled, and it built on results previously discovered by Opus. (“Shocker,” he concludes sardonically.)
  • The CEO of the AI company HuggingFace then ​reported​ that they took all of the specific vulnerabilities that Anthropic highlighted and “ran them through small, cheap, open-weight models.” What did they find? “Those models recovered much of the same analysis.”

Since Marcus published his essay, I’ve come across several more similar findings:

  • The AI security expert Stanislav Fort ran ​an experiment​ to see if existing, cheap open-weight models could find the same vulnerability in FreeBSD (an open-source operating system) that Anthropic touted as evidence of Mythos’s scary abilities to uncover bugs that had been hiding for decades. The result: all eight existing models they tested discovered the same issue.
  • Meanwhile, the renowned security researcher Bruce Schneier ​weighed in​, similarly concluding: “You don’t need Mythos to find the vulnerabilities they found.”

And of course, it doesn’t help that a week before Anthropic released this supposedly super-powered vulnerability detector, they accidentally leaked the Claude Code source, and security researchers immediately found ​serious vulnerabilities​. (I guess Anthropic forgot to use Mythos to clean up their own software…)

~~~

What’s really happening?

It’s fair to say that LLMs have created significant cybersecurity concerns that researchers have been scrambling to address in recent years. It’s also fair to say, however, that we don’t yet have evidence that Claude Mythos significantly changed this reality. If anything, some of the early independent testing by security researchers implies that Mythos might be better understood as a version of Opus 4.6 tuned to perform better on a handful of benchmarks. And yet, many still took Anthropic at their word and covered this model’s release as a catastrophic event.

In a ​recent video​, the AI commentator Mo Bitar compared Anthropic’s model rollouts to Apple iPhone launches, where every year they resell you the same product with minor improvements. “Except here,” he adds, “the product is existential dread.”

And we keep falling for it.

I think we’ve entered a stage where we need to almost entirely discount any claims made by the AI companies themselves until we can independently verify what’s actually going on.

Share.

Comments are closed.

Exit mobile version