Frontier models we benchmark

Per-model scorecards across coding, agentic, reasoning and multilingual evaluations.