AutomationBench
AutomationBench (from Zapier) measures how reliably a model can build and run real-world automations end to end — wiring up triggers, transforming data and chaining actions across third-party apps with minimal human help. Scores are low across the board, leaving plenty of headroom.
Official source: Anthropic — Fable 5 / Mythos 5 announcement