All benchmarks

BioMysteryBench

Biology

BioMysteryBench presents unsolved-style biology puzzles that require generating and testing hypotheses against experimental evidence. Scores are reported on the hard subset and on the subset that human experts have solved.

Model scores

  • Fable 546.1% (hard) / 83.9% (human solved)
  • Opus 4.840.0% (hard) / 80.4% (human solved)
  • GPT-5.5
  • Opus 4.7
  • Gemini 3.1 Pro
  • Mythos Preview29.6% (hard) / 82.6% (human solved)

Official source: Anthropic — Fable 5 / Mythos 5 announcement

Related reading