All benchmarks

MCP-Atlas

Scaled tool use

MCP-Atlas evaluates how well a model orchestrates a large catalog of external tools over the Model Context Protocol — selecting the right tool, chaining calls and handling their results across long, multi-tool workflows.

Model scores

  • Opus 4.882.2%
  • Opus 4.779.1%
  • GPT-5.575.3%
  • Gemini 3.1 Pro78.2%
  • Mythos Preview

Official source: MCP-Atlas leaderboard (Scale)

Related reading