All benchmarks

Blueprint-Bench 2

Spatial reasoning

Blueprint-Bench 2 evaluates spatial reasoning: interpreting floor plans, technical drawings and physical layouts, and answering questions that require building a coherent mental model of space rather than reading text or charts.

Model scores

  • Fable 538.6%
  • Opus 4.814.5%
  • GPT-5.536.2%
  • Opus 4.7
  • Gemini 3.1 Pro26.5%
  • Mythos Preview

Official source: Anthropic — Fable 5 / Mythos 5 announcement

Related reading