All benchmarks

CharXiv Reasoning

Visual reasoning

CharXiv Reasoning is a chart-understanding suite built from thousands of real charts in arXiv papers. It tests whether a model can synthesize visual information across complex scientific figures to answer multi-step questions.

Model scores

  • Opus 4.880.5% (no tools) / 89.9% (with tools)
  • Opus 4.781.3% (no tools) / 90.1% (with tools)
  • GPT-5.5
  • Gemini 3.1 Pro
  • Mythos Preview86.1% (no tools) / 93.2% (with tools)

Official source: CharXiv project

Related reading