Gemini 2.5 Pro is also claimed to have outperformed models like OpenAI's o3-mini, Grok 3 Beta, Claude 3.7 Sonnet, and DeepSeek R1 in several benchmarks, such as GPQA Diamond, AIME 2024 and 2025, Aider ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results