Claude Fable 5 Benchmarks: The Mythos-Class Model Goes GA

On June 9, 2026, Anthropic launched Claude Fable 5 — the first Mythos-class model made generally available. Until now, that capability tier was locked behind Project Glasswing, the restricted cybersecurity program that hosted Mythos Preview. Fable 5 is the same underlying model as the new Claude Mythos 5; the difference is a set of safety classifiers that route requests in sensitive domains (cybersecurity, biology and chemistry, distillation) to Claude Opus 4.8 instead.

This post covers every benchmark number Anthropic has reported so far, the pricing, and what the safeguards mean in practice. You can track Fable 5 against every other frontier model in the live benchmark comparison, or jump straight to the Claude Fable 5 model hub.

The headline numbers

Anthropic's launch table reports scores for the Fable 5 / Mythos 5 family across fourteen evaluations. The full set, with the strongest prior model on each for context:

Benchmark	Fable 5 / Mythos 5	Best of the rest
SWE-bench Pro	80.3%	77.8% (Mythos Preview)
SWE-bench Verified	95.0%	93.9% (Mythos Preview)
FrontierCode (Diamond)	29.3%	13.4% (Opus 4.8)
Terminal-Bench 2.1	88.0%*	83.4% (GPT-5.5, Codex CLI)
GDPval-AA	1932	1890 (Opus 4.8)
GDPpdf (no tools)	29.8%	24.9% (GPT-5.5)
Blueprint-Bench 2	38.6%	36.2% (GPT-5.5)
AutomationBench	17.4%	15.5% (Opus 4.8)
OSWorld-Verified	85.0%	85.4% (Mythos Preview)
Legal Agent Benchmark	13.3%	10.4% (Opus 4.8)
Humanity's Last Exam (no tools / with tools)	59.0%* / 64.5%*	56.8% / 64.7% (Mythos Preview)
BioMysteryBench (hard / human solved)	46.1%* / 83.9%*	40.0% / 80.4% (Opus 4.8)
ExploitBench (Cap%)	78.0%*	69.0% (Mythos Preview)
HealthBench Professional	66.0%*	64.7% (Mythos Preview)
CyberGym	83.8%*	83.1% (Mythos Preview)

*Starred benchmarks are measured with safeguards lifted — i.e. the Mythos 5 configuration. On these, Claude Fable 5 performs closer to Opus 4.8 because its classifiers fall back on cybersecurity, biology and related requests. Anthropic notes the two configurations are within 1–3 points of each other elsewhere; the table shows the higher of the two.

On SWE-bench Pro, the hardest real-world coding eval available, 80.3% is a new record — 11.1 points ahead of Claude Opus 4.8 (69.2%) and 21.7 points ahead of GPT-5.5 (58.6%). On SWE-bench Verified, Fable 5's 95.0% pushes a benchmark that was already saturating even closer to its ceiling (Mythos 5, with safeguards lifted, is reported at 95.5%). And on Cognition's FrontierCode Diamond, which scores production-quality coding rather than test-passing, Fable 5 more than doubles Opus 4.8.

The one benchmark the family does not top is OSWorld-Verified, where Anthropic's updated figure for Mythos Preview (85.4%) edges out Fable 5's 85.0%. The launch also debuts several evals with plenty of headroom — like AutomationBench (17.4%) and the Legal Agent Benchmark (13.3%) — where even the best models fail most tasks.

Pricing: the most expensive frontier model

Fable 5 costs $10 per million input tokens and $50 per million output tokens — double Opus 4.8's $5/$25 and well above GPT-5.5's $5/$30. Anthropic notes this is less than half the price of Claude Mythos Preview, but it still makes Fable 5 the most expensive of the major generally available models. Developers can access it via the Claude API as claude-fable-5.

The subscription rollout is staged: Fable 5 is included on Pro, Max, Team and seat-based Enterprise plans at no extra cost through June 22, then moves to usage credits until capacity allows it to return as a standard plan feature.

How the safeguards work

Fable 5 ships with safety classifiers covering three areas: cybersecurity, biology and chemistry, and model distillation. When a request trips a classifier, the response is handled by Claude Opus 4.8 instead, and the user is informed. Anthropic says more than 95% of sessions involve no fallback at all — for those sessions Fable 5's performance is effectively identical to Mythos 5.

For benchmark readers, the practical implication is the CyberGym caveat above: cyber evals for this model family are measured on the Mythos 5 configuration. On CyberGym, the family's 83.8% edges out Mythos Preview's 83.1% and clearly leads GPT-5.5 (81.8%) and Opus 4.8 (78.8%) — but a Fable 5 user asking offensive-cyber questions will get Opus 4.8 answers.

Where this leaves the leaderboard

Until today, our coding rankings carried an asterisk: Mythos Preview topped most benchmarks but was a restricted research preview, so Claude Opus 4.8 was the practical recommendation. Fable 5 removes the asterisk — Mythos-class scores are now attached to a model anyone can use. For teams deciding between frontier models, the question shifts from availability to cost: Fable 5's SWE-bench Pro lead is enormous, but you pay 2× Opus 4.8 rates for it. See the Fable 5 vs GPT-5.5 head-to-head for the full picture, and the best LLM for coding ranking for how it reshuffles the field.

As always, treat vendor-reported launch numbers with the usual caution — see the complete guide to LLM benchmarks for how to read them and why independent verification matters.

Key takeaways

New SWE-bench Pro record: 80.3%, +11.1 points over Opus 4.8 — the largest lead any generally available model has held on this benchmark.
SWE-bench Verified is effectively saturated: 95.0% for Fable 5, 95.5% for Mythos 5.
Premium pricing: $10/$50 per Mtok — 2× Opus 4.8, the most expensive major model on the market.
Safeguards with a fallback, not refusals: sensitive requests are answered by Opus 4.8; over 95% of sessions never trigger a fallback.
A few cells remain unreported — AA-LCR, BrowseComp, MCP-Atlas, Finance Agent, GPQA Diamond, CharXiv and MMMLU — and we will fill them in as numbers land. Track them on the Fable 5 model hub or the live comparison table.

The headline numbers

Pricing: the most expensive frontier model

How the safeguards work

Where this leaves the leaderboard

Key takeaways

Keep reading