GPT-5.5 vs Mythos Preview

Mythos Preview leads, winning 8 of 8 directly-comparable benchmarks against GPT-5.5.

Head-to-head record: GPT-5.5 0 · 8 Mythos Preview

GPT-5.5vsMythos Preview
Opus 4.8
$5/$25
Opus 4.7
$5/$25
GPT-5.5
$5/$30
Gemini 3.1 Pro
$2/$12
Mythos Preview
Agentic coding
SWE-bench Pro
69.2%
64.3%
58.6%
54.2%
77.8%
Agentic coding
SWE-bench Verified
88.6%
87.6%
88.7%
80.6%
93.9%
Long context reasoning
AA-LCR
67.7%
70.3%
74.3%
Agentic terminal coding
Terminal-Bench 2.1
74.6%
66.1%
78.2%
70.3%
82.0%
Multidisciplinary reasoning
Humanity's Last Exam
49.8%
no tools
57.9%
with tools
46.9%
no tools
54.7%
with tools
41.4%
no tools (Pro)
52.2%
with tools (Pro)
44.4%
no tools
51.4%
with tools
56.8%
no tools
64.7%
with tools
Agentic search
BrowseComp
84.3%
79.8%
84.4%
85.9%
86.9%
Scaled tool use
MCP-Atlas
82.2%
79.1%
75.3%
78.2%
Agentic computer use
OSWorld-Verified
83.4%
82.8%
78.7%
76.2%
79.6%
Agentic financial analysis
Finance Agent v2
53.9%
51.5%
51.8%
43.0%
Cybersecurity vulnerability reproduction
CyberGym
78.8%
73.1%
81.8%
83.1%
Graduate-level reasoning
GPQA Diamond
93.6%
94.2%
93.6%
94.3%
94.6%
Visual reasoning
CharXiv Reasoning
80.5%
no tools
89.9%
with tools
81.3%
no tools
90.1%
with tools
86.1%
no tools
93.2%
with tools
Multilingual Q&A
MMMLU
91.5%
83.2%
92.6%