MCP-Atlas

Scaled tool use

MCP-Atlas evaluates how well a model orchestrates a large catalog of external tools over the Model Context Protocol — selecting the right tool, chaining calls and handling their results across long, multi-tool workflows.

Model scores

Fable 5—
Opus 4.882.2%
Sonnet 5—
GPT-5.6 Sol—
GPT-5.575.3%
Composer 2.5—
Opus 4.779.1%
Gemini 3.1 Pro78.2%
Mythos Preview—

Official source: MCP-Atlas leaderboard (Scale)

Model scores

Related reading