All benchmarks

AA-LCR

Long context reasoning

Artificial Analysis's Long Context Reasoning evaluation tests whether a model can reason over very large inputs — synthesizing facts scattered across long documents rather than simply retrieving a single passage.

Model scores

  • Opus 4.867.7%
  • Opus 4.770.3%
  • GPT-5.574.3%
  • Gemini 3.1 Pro
  • Mythos Preview

Official source: Artificial Analysis — AA-LCR

Related reading