Claude Fable 5 Benchmarks: The Mythos-Class Model Goes GA
Claude Fable 5 brings Mythos-class capability to general availability: 80.3% SWE-bench Pro, 95% SWE-bench Verified, $10/$50 per Mtok. Every reported number, explained.
On June 9, 2026, Anthropic launched Claude Fable 5 — the first Mythos-class model made generally available. Until now, that capability tier was locked behind Project Glasswing, the restricted cybersecurity program that hosted Mythos Preview. Fable 5 is the same underlying model as the new Claude Mythos 5; the difference is a set of safety classifiers that route requests in sensitive domains (cybersecurity, biology and chemistry, distillation) to Claude Opus 4.8 instead.
This post covers every benchmark number Anthropic has reported so far, the pricing, and what the safeguards mean in practice. You can track Fable 5 against every other frontier model in the live benchmark comparison, or jump straight to the Claude Fable 5 model hub.
The headline numbers
Anthropic's launch table reports scores for the Fable 5 / Mythos 5 family across fourteen evaluations. The full set, with the strongest prior model on each for context:
| Benchmark | Fable 5 / Mythos 5 | Best of the rest |
|---|---|---|
| SWE-bench Pro | 80.3% | 77.8% (Mythos Preview) |
| SWE-bench Verified | 95.0% | 93.9% (Mythos Preview) |
| FrontierCode (Diamond) | 29.3% | 13.4% (Opus 4.8) |
| Terminal-Bench 2.1 | 88.0%* | 83.4% (GPT-5.5, Codex CLI) |
| GDPval-AA | 1932 | 1890 (Opus 4.8) |
| GDPpdf (no tools) | 29.8% | 24.9% (GPT-5.5) |
| Blueprint-Bench 2 | 38.6% | 36.2% (GPT-5.5) |
| AutomationBench | 17.4% | 15.5% (Opus 4.8) |
| OSWorld-Verified | 85.0% | 85.4% (Mythos Preview) |
| Legal Agent Benchmark | 13.3% | 10.4% (Opus 4.8) |
| Humanity's Last Exam (no tools / with tools) | 59.0%* / 64.5%* | 56.8% / 64.7% (Mythos Preview) |
| BioMysteryBench (hard / human solved) | 46.1%* / 83.9%* | 40.0% / 80.4% (Opus 4.8) |
| ExploitBench (Cap%) | 78.0%* | 69.0% (Mythos Preview) |
| HealthBench Professional | 66.0%* | 64.7% (Mythos Preview) |
| CyberGym | 83.8%* | 83.1% (Mythos Preview) |
*Starred benchmarks are measured with safeguards lifted — i.e. the Mythos 5 configuration. On these, Claude Fable 5 performs closer to Opus 4.8 because its classifiers fall back on cybersecurity, biology and related requests. Anthropic notes the two configurations are within 1–3 points of each other elsewhere; the table shows the higher of the two.
On SWE-bench Pro, the hardest real-world coding eval available, 80.3% is a new record — 11.1 points ahead of Claude Opus 4.8 (69.2%) and 21.7 points ahead of GPT-5.5 (58.6%). On SWE-bench Verified, Fable 5's 95.0% pushes a benchmark that was already saturating even closer to its ceiling (Mythos 5, with safeguards lifted, is reported at 95.5%). And on Cognition's FrontierCode Diamond, which scores production-quality coding rather than test-passing, Fable 5 more than doubles Opus 4.8.
The one benchmark the family does not top is OSWorld-Verified, where Anthropic's updated figure for Mythos Preview (85.4%) edges out Fable 5's 85.0%. The launch also debuts several evals with plenty of headroom — like AutomationBench (17.4%) and the Legal Agent Benchmark (13.3%) — where even the best models fail most tasks.
Pricing: the most expensive frontier model
Fable 5 costs $10 per million input tokens and $50 per million output tokens — double Opus 4.8's $5/$25 and well above GPT-5.5's $5/$30. Anthropic notes this is less than half the price of Claude Mythos Preview, but it still makes Fable 5 the most expensive of the major generally available models. Developers can access it via the Claude API as claude-fable-5.
The subscription rollout is staged: Fable 5 is included on Pro, Max, Team and seat-based Enterprise plans at no extra cost through June 22, then moves to usage credits until capacity allows it to return as a standard plan feature.
How the safeguards work
Fable 5 ships with safety classifiers covering three areas: cybersecurity, biology and chemistry, and model distillation. When a request trips a classifier, the response is handled by Claude Opus 4.8 instead, and the user is informed. Anthropic says more than 95% of sessions involve no fallback at all — for those sessions Fable 5's performance is effectively identical to Mythos 5.
For benchmark readers, the practical implication is the CyberGym caveat above: cyber evals for this model family are measured on the Mythos 5 configuration. On CyberGym, the family's 83.8% edges out Mythos Preview's 83.1% and clearly leads GPT-5.5 (81.8%) and Opus 4.8 (78.8%) — but a Fable 5 user asking offensive-cyber questions will get Opus 4.8 answers.
Where this leaves the leaderboard
Until today, our coding rankings carried an asterisk: Mythos Preview topped most benchmarks but was a restricted research preview, so Claude Opus 4.8 was the practical recommendation. Fable 5 removes the asterisk — Mythos-class scores are now attached to a model anyone can use. For teams deciding between frontier models, the question shifts from availability to cost: Fable 5's SWE-bench Pro lead is enormous, but you pay 2× Opus 4.8 rates for it. See the Fable 5 vs GPT-5.5 head-to-head for the full picture, and the best LLM for coding ranking for how it reshuffles the field.
As always, treat vendor-reported launch numbers with the usual caution — see the complete guide to LLM benchmarks for how to read them and why independent verification matters.
Key takeaways
- New SWE-bench Pro record: 80.3%, +11.1 points over Opus 4.8 — the largest lead any generally available model has held on this benchmark.
- SWE-bench Verified is effectively saturated: 95.0% for Fable 5, 95.5% for Mythos 5.
- Premium pricing: $10/$50 per Mtok — 2× Opus 4.8, the most expensive major model on the market.
- Safeguards with a fallback, not refusals: sensitive requests are answered by Opus 4.8; over 95% of sessions never trigger a fallback.
- A few cells remain unreported — AA-LCR, BrowseComp, MCP-Atlas, Finance Agent, GPQA Diamond, CharXiv and MMMLU — and we will fill them in as numbers land. Track them on the Fable 5 model hub or the live comparison table.