All benchmarks

GPQA Diamond

Graduate-level reasoning

GPQA Diamond is a set of graduate-level, “Google-proof” questions in biology, chemistry and physics, written by domain experts to be hard even for skilled non-specialists with web access. It probes deep scientific reasoning.

Model scores

  • Opus 4.893.6%
  • Opus 4.794.2%
  • GPT-5.593.6%
  • Gemini 3.1 Pro94.3%
  • Mythos Preview94.6%

Official source: GPQA (GitHub)

Related reading