All benchmarks

AutomationBench

Tool use

AutomationBench (from Zapier) measures how reliably a model can build and run real-world automations end to end — wiring up triggers, transforming data and chaining actions across third-party apps with minimal human help. Scores are low across the board, leaving plenty of headroom.

Model scores

  • Fable 517.4%
  • Opus 4.815.5%
  • GPT-5.512.9%
  • Opus 4.7
  • Gemini 3.1 Pro9.6%
  • Mythos Preview

Official source: Anthropic — Fable 5 / Mythos 5 announcement

Related reading