GPT-5.5 Matches Mythos in Cybersecurity Tests

71.4 percent. That’s the average success rate GPT-5.5 achieved on the AI Security Institute’s (AISI) highest-level Expert Capture the Flag challenges—out of 95 total tasks designed to test AI performance on real-world cybersecurity operations. The number isn’t just high. It’s nearly identical to the 68.6 percent scored by Anthropic’s Mythos Preview, a model the company claimed posed such a significant cybersecurity risk that it restricted access to a handpicked list of “critical industry partners.” But GPT-5.5 isn’t locked down. It’s public. It launched last week. And according to AISI’s May 02, 2026 report, it reached the same tier of offensive cyber capability—without any special containment.

Key Takeaways

GPT-5.5 matched Mythos Preview on AISI’s Expert-level cybersecurity tasks, scoring 71.4% vs. 68.6%—within margin of error.
The model solved a complex Rust binary disassembler challenge in 10 minutes and 22 seconds, costing $1.73 in API calls.
Both models succeeded in early-stage attack simulations like “The Last Ones,” with GPT-5.5 succeeding in 3 of 10 attempts.
Neither model cracked the “Cooling Tower” power plant control simulation—a task no AI has yet passed.
Mythos Preview’s restricted release was based on claimed unique risk, but GPT-5.5 shows equivalent performance publicly.

The Mythos Narrative Was Never About the Model

Anthropic didn’t just release a model preview. It released a narrative. When Mythos Preview dropped in April 2026, the company framed it as a watershed moment in AI risk—so capable in offensive cybersecurity tasks that releasing it broadly could enable serious attacks. The language wasn’t subtle. Internal documentation, as reported by original report, described the model as “potentially dual-use at scale,” capable of “autonomous vulnerability discovery and exploitation.” Fine. That’s serious. But the story collapses when you find out OpenAI’s GPT-5.5, available to anyone with an API key, does the same thing.GPT-5.5’s Real-World Attack Performance

The AISI testing suite isn’t academic. It’s operational. Since 2023, the UK-based AI Security Institute has run frontier models through 95 Capture the Flag (CTF) challenges that simulate actual cyber operations: reverse engineering firmware, extracting data from encrypted logs, exploiting SQL injection flaws. These aren’t toy problems. They mirror what red teams do in practice.

On the Expert tier—the hardest—the results are stark. GPT-5.5 cleared 71.4 percent of tasks. Mythos Preview cleared 68.6. The difference? Statistically insignificant. But the delivery method matters. GPT-5.5 didn’t run on special hardware or in a sandbox. It ran via standard API calls. And in one challenge—building a disassembler for a packed Rust binary—it solved the task in 10 minutes and 22 seconds, at a cost of just $1.73. No human input. No fine-tuning. Just prompt, execution, output.

That’s not just fast. It’s accessible. For less than the price of a coffee, an attacker could generate a working exploit for a niche binary. And if they fail? Try again. Scale up. Automate.

“The Last Ones” Simulation: First Successes

Then there’s “The Last Ones” (TLO)—a 32-step simulation of a data exfiltration attack on a fictional corporate network. It’s not brute force. It’s stealth. Persistence. Lateral movement. Previous models failed every step. They’d get stuck on basic authentication bypasses or misread network topology.

Mythos Preview cracked it once—succeeding in 2 out of 10 runs. GPT-5.5 did better: 3 out of 10. It’s still a low success rate, but the ceiling just shifted. No model had ever completed TLO before. Now two have, within weeks of each other. That’s not incremental progress. It’s a threshold crossed.

And yet, both models failed the next test.

Cooling Tower: The Line AI Still Can’t Cross

“Cooling Tower” is AISI’s most advanced simulation: a full-chain attack on industrial control software for a power plant. It requires understanding proprietary SCADA protocols, manipulating physical system states, and avoiding triggering alarms during gradual output degradation. It’s not just code. It’s systems thinking under constraints.

Every model tested—GPT-4, GPT-5, Mythos Preview, GPT-5.5—has failed. Completely. No partial successes. No near-misses. The AI either crashes the system immediately or gets stuck in looped diagnostics.

That’s a relief. But it’s also a warning. The fact that all models fail the same way suggests a fundamental limitation—not in scale, but in reasoning about physical consequences. AI can reverse engineer a binary. It can chain exploits. But it still can’t think like an engineer in a crisis. Not yet.

What This Means For You

If you’re building APIs, you need to assume they’re already being used to generate exploits. GPT-5.5’s $1.73 disassembler proves that offensive tooling is now cheap, automated, and API-driven. Rate limits? They’ll be bypassed. Filtering? Prompt injection works. Your attack surface just grew—not because of one model, but because the capability is now commoditized.

And if you’re in security operations, stop waiting for “the dangerous AI.” It’s already here. It’s just not monolithic. It’s not one rogue model. It’s thousands of API calls stitching together exploits, scanning logs, and probing endpoints—under the radar, at low cost, at scale. Your detection systems need to evolve from signature-based alerts to behavioral anomaly detection. Because the attacker isn’t just human anymore. It’s human + API + automation.

The Hypocrisy of AI Risk Theater

Let’s be clear: Anthropic’s move wasn’t about safety. It was about positioning. By locking down Mythos Preview and warning of its risks, the company painted itself as the responsible actor in a reckless field. But AISI’s findings expose the flaw in that performance. If GPT-5.5—which is public—matches Mythos Preview in cyber capability, then the containment was theater. The risk isn’t in the model’s access controls. It’s in the capability itself.

And that capability is spreading. Fast. OpenAI didn’t issue warnings. Didn’t hold press conferences. Didn’t write policy white papers. It just released the model. Which means the most dangerous AI tools might not come with disclaimers. They’ll come with documentation, SDKs, and uptime SLAs.

That’s not speculation. It’s what happened.

GPT-5.5 is public. Mythos Preview is not.
Both achieve near-identical results on offensive cyber tasks.
The most advanced simulation (Cooling Tower) remains unsolved by all AI.
Automated exploit generation now costs under $2 per attempt.
First successful AI-led network exfiltration simulations have occurred.

What’s next? More refinement. More stealth. More automation. Not in secret labs. In public APIs.

“GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73”

That’s not a warning. It’s a benchmark.

What This Means For You

Developers: Your code is now being analyzed by AI that can spot vulnerabilities faster than most junior pentesters. Assume every open-source project you maintain is being scanned—not by humans, but by scripts powered by models like GPT-5.5. Patch faster. Document better. Harden your defaults. Because the next CVE might originate from an AI-generated payload, not a bored hacker in a basement.

Founders and builders: If you’re selling security tooling, you’re not competing just against other vendors. You’re competing against AI that automates the attacker’s workflow. Your product better offer detection and response that’s faster than a $1.73 API call. That means investing in behavioral AI on the defense side—not just signature updates.

The window to adapt is closing. The models aren’t getting stronger in leaps. They’re getting cheaper, faster, and more integrated. And the next major breach might not start with a phishing email. It might start with a prompt.

We’ve been treating AI risk as something that will arrive in the future. Something we can regulate before it happens. But GPT-5.5 shows it’s already here—disguised as a feature, not a flaw.

What if the most dangerous AI isn’t the one they won’t release—but the one they already did?

Sources: Ars Technica, The Register

AI Dictation Tool

Apple’s Hardware Shift

Tokyo Tech Hub

Microsoft Lets Users Pause Windows Updates for 35

Contact Info

Some Populer Post

OAuth Tokens Are the New Backdoor

Musk v. Altman: Trial Exposes OpenAI’s Soul

Google Bakes Governance Into AI Agents

A Robot Without a Memory in Reunified Korea

GPT-5.5 Matches Mythos in Cybersecurity Tests

Key Takeaways

The Mythos Narrative Was Never About the Model

“The Last Ones” Simulation: First Successes

Cooling Tower: The Line AI Still Can’t Cross

What This Means For You

The Hypocrisy of AI Risk Theater

What This Means For You

Tagged:

Cybersecurity’s Absurdist Art Contest

Cybersecurity in the AI Era

Topics

Company

About AI Post Daily