
TL;DR
Fable 5 topped GPT 5.5 on every major benchmark but was pulled by the US government after three days, making GPT 5.5 the top model you can actually use.
Anthropic’s Fable 5 spent three days as the most capable AI model ever released to the public. It topped the Chatbot Arena leaderboard, crushed OpenAI’s GPT 5.5 on coding benchmarks by double-digit margins, and gave paying subscribers access to Mythos-class reasoning for the first time. Then, on June 12, the US government ordered Anthropic to shut it down.
The result is a strange moment in AI. The model that demonstrably outperforms everything else on the market is the one you cannot use. GPT 5.5, which OpenAI launched in late April under the internal codename “Spud,” is now the strongest model available to developers and consumers, not because it improved but because its only real competitor was removed.
The benchmark gap between the two is not close. On SWE-Bench Pro, which measures a model’s ability to resolve real software engineering issues across open-source codebases, Fable 5 scored 80.3% to GPT 5.5’s 58.6%, a 22-point difference. On SWE-Bench Verified, a curated subset of the same benchmark, Fable 5 reached 95.0%.
The coding benchmarks tell a similar story. Fable 5 leads the Code Arena by 98 Elo points, scoring 1,665 to GPT 5.5’s 1,501. On FrontierCode Diamond, a benchmark designed to test the most difficult programming tasks, Fable 5 scored 29.3% while GPT 5.5 managed 5.7%, and on the broader Chatbot Arena leaderboard Fable 5 sits at number one with GPT 5.5 in fourth.
GPT 5.5 does have one area of strength. On Terminal-Bench 2.0, which evaluates interactive terminal-based coding tasks rather than codebase-level issue resolution, GPT 5.5 scored 82.7% compared to Fable 5’s approximately 88.0%. The gap is narrower there, and the benchmark tests a different skill, executing commands and debugging in real time rather than reading and patching large repositories.
Pricing also favours OpenAI. GPT 5.5 costs $5 per million input tokens and $30 per million output tokens, half the price of Fable 5’s $10 and $50 respectively. For developers running high-volume applications where the performance difference is less critical than cost, GPT 5.5 is the more practical choice even when both models are available.
Fable 5 launched on June 9 as Anthropic’s first Mythos-class model made available to the general public. It offered a one-million-token context window and 128,000 output tokens. Anthropic made it available at no extra cost to Pro, Max, Team, and Enterprise subscribers until June 22, a promotional window that the government directive cut short after just three days.
The shutdown came via an export control directive issued on June 12. The government cited a jailbreak vulnerability as the reason for pulling both Fable 5 and the broader Mythos 5 model family. Anthropic has disputed the severity of the finding, saying the vulnerabilities identified are minor, publicly known, and achievable by GPT 5.5 without any bypass techniques, while reports indicate that Amazon CEO Andy Jassy played a role in triggering the government’s review.
The practical consequence is that developers and researchers who were evaluating Fable 5 for production use have had to revert to GPT 5.5 or Anthropic’s earlier Opus models. For coding-heavy workflows, the downgrade is significant. The 22-point gap on SWE-Bench Pro represents the difference between a model that can resolve four out of five real-world software issues and one that handles roughly three out of five.
Whether Fable 5 returns depends on Anthropic’s negotiations with the government over the export control classification. The company has publicly argued that the directive is disproportionate and that the cited vulnerabilities do not justify pulling the model entirely. Until that dispute is resolved, GPT 5.5 holds the top spot by default, the best model available not because it is the best model that exists.
View original source — The Next Web ↗



