All benchmarks

Legal Agent Benchmark

Legal

The Legal Agent Benchmark evaluates agentic legal work — reviewing contracts, producing redlines and answering questions that require sustained reasoning over long legal documents. Scores are low across all models, making it one of the least saturated evals reported.

Model scores

  • Fable 513.3%
  • Opus 4.810.4%
  • GPT-5.52.1%
  • Opus 4.7
  • Gemini 3.1 Pro0.0%
  • Mythos Preview

Official source: Anthropic — Fable 5 / Mythos 5 announcement

Related reading