Claude Opus 4.6
Anthropic
GPQA
MathArena
Arc-AGI-2
Intelligence ranks the published roster by raw Humanity's Last Exam evidence. For the benchmark policy and supporting evidence rules, see the Methodology page.
Buyer-facing table
| Rank | Model | Raw HLEHumanity's Last Exam | GPQA | MathArena | Arc-AGI-2 |
|---|---|---|---|---|---|
| 01 | Claude Opus 4.6 |
Anthropic
GPQA
MathArena
Arc-AGI-2
Editorial investigation
Intelligence is the least interpreted surface on the site. The page keeps the raw HLE result front and center, then leaves methodology and evidence depth to the model page and methodology page.
Open top intelligence model"Best if your work involves genuinely hard problems ? deep research, complex code, or legal and financial analysis ? where accuracy matters more than speed."
Claude Opus 4.6
Anthropic
62.7% |
| n/a | 66.2% | 68.8% |
| 02 | Qwen 3.6 Plus Preview | 46.3% | n/a | 58.1% | n/a |
|---|
| 03 | Gemini 3.1 Pro | 44.4% | 94.3% | 73.4% | 77.1% |
|---|
| 04 | GPT-5.4 Replaces GPT-5.2 | 41.6% | 92.0% | 78.7% | 73.3% |
|---|
| 05 | Gemini 3 Flash | 33.7% | 90.4% | 60.3% | 33.6% |
|---|
| 06 | Claude Sonnet 4.6 | 33.2% | 89.9% | n/a | 58.3% |
|---|
| 07 | Kimi K2.5 | 29.4% | n/a | 55.7% | 12.1% |
|---|
| 08 | MiniMax M2.7 | 28.1% | n/a | n/a | n/a |
|---|
| 09 | DeepSeek V3.2 (Thinking) | 22.0% | 85.0% | 51.5% | n/a |
|---|
| 10 | Grok 4.1 Fast | 20.0% | n/a | 49.9% | 16.0% |
|---|
Raw HLE
Qwen
GPQA
MathArena
Arc-AGI-2
Raw HLE
GPQA
MathArena
Arc-AGI-2
Raw HLE
OpenAI
GPQA
MathArena
Arc-AGI-2
Raw HLE
GPQA
MathArena
Arc-AGI-2
Raw HLE
Anthropic
GPQA
MathArena
Arc-AGI-2
Raw HLE
Moonshot AI
GPQA
MathArena
Arc-AGI-2
Raw HLE
MiniMax
GPQA
MathArena
Arc-AGI-2
Raw HLE
DeepSeek
GPQA
MathArena
Arc-AGI-2
Raw HLE
xAI
GPQA
MathArena
Arc-AGI-2
Raw HLE