All benchmarks

SWE-bench Multilingual

Agentic coding

SWE-bench Multilingual extends SWE-bench beyond Python to real bug-fix tasks across many programming languages, each scored by whether the model’s patch passes the repository’s hidden tests. It is the headline agentic-coding benchmark Cursor reports for its Composer models.

Model scores

  • Fable 5
  • Opus 4.8
  • Sonnet 5
  • GPT-5.6 Sol
  • GPT-5.577.8%
  • Composer 2.579.8%
  • Opus 4.780.5%
  • Gemini 3.1 Pro
  • Mythos Preview

Official source: Cursor — Introducing Composer 2.5

Related reading