Leaderboard
AI Model Rankings 2026
The most comprehensive AI model comparison. ELO ratings from Chatbot Arena, benchmark scores, pricing, and what each model is best at.
| # | Model | Developer | ELO | MMLU | Context | Price (Input) | Type |
|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 | Anthropic | 1496 | 91.1 | 1M | $5.00 | Closed |
| 2 | Gemini 3 Pro | 1486 | 91.8 | 1M | Closed | ||
| 3 | Claude Opus 4.5 | Anthropic | 1467 | 90.8 | 200K | $5.00 | Closed |
| 4 | Gemini 2.5 Pro | 1450 | — | 1M | $1.25 | Closed | |
| 5 | Kimi K2.5 | Moonshot AI | 1449 | — | 262K | Closed | |
| 6 | GPT-4.5 Preview | OpenAI | 1444 | — | 128K | $75.00 | Closed |
| 7 | GPT-4o | OpenAI | 1442 | 88.7 | 128K | $2.50 | Closed |
| 8 | GPT-5.2 | OpenAI | 1437 | 89.6 | 400K | $1.75 | Closed |
| 9 | o3 | OpenAI | 1433 | — | 200K | $10.00 | Closed |
| 10 | DeepSeek R1 | DeepSeek | 1418 | 90.8 | 64K | $0.55 | Open |
| 11 | Claude Opus 4 | Anthropic | 1414 | — | 200K | $5.00 | Closed |
| 12 | Mistral Large 3 | Mistral AI | 1414 | — | 128K | $2.00 | Closed |
| 13 | Grok-3 | xAI | 1411 | 92.7 | 131K | $3.00 | Closed |
| 14 | Gemini 2.5 Flash | 1410 | — | 1M | $0.30 | Closed | |
| 15 | Claude Haiku 4.5 | Anthropic | 1404 | — | 200K | $1.00 | Closed |
| 16 | o1 | OpenAI | 1402 | 90.8 | 200K | $15.00 | Closed |
| 17 | GPT-4o Mini | OpenAI | — | 82 | 128K | $0.15 | Closed |
| 18 | GPT-4 Turbo | OpenAI | — | 86.5 | 128K | $10.00 | Closed |
| 19 | o3-mini | OpenAI | — | — | 200K | $1.10 | Closed |
| 20 | Claude 3.5 Sonnet | Anthropic | — | 90.4 | 200K | $3.00 | Closed |
| 21 | Claude 3.5 Sonnet (Oct 2024) | Anthropic | — | 90.4 | 200K | $3.00 | Closed |
| 22 | Claude 3 Opus | Anthropic | — | 86.8 | 200K | $15.00 | Closed |
| 23 | Claude 3 Haiku | Anthropic | — | 75.2 | 200K | $0.25 | Closed |
| 24 | Claude Sonnet 4 | Anthropic | — | — | 200K | $3.00 | Closed |
| 25 | Gemini 1.5 Pro | — | 85.9 | 2M | $1.25 | Closed | |
| 26 | Gemini 2.0 Flash | — | 76.4 | 1M | $0.10 | Closed | |
| 27 | Llama 3.1 405B | Meta | — | 86 | 128K | $0.80 | Open |
| 28 | Llama 3.1 70B | Meta | — | 83.6 | 128K | $0.35 | Open |
| 29 | Llama 3.1 8B | Meta | — | 73 | 128K | $0.05 | Open |
| 30 | Mistral Large 2 | Mistral AI | — | 84 | 128K | $2.00 | Closed |
| 31 | Mixtral 8x22B | Mistral AI | — | 77.3 | 64K | $2.00 | Open |
| 32 | DeepSeek V3 | DeepSeek | — | 88.5 | 128K | $0.14 | Open |
| 33 | Grok-2 | xAI | — | — | 128K | $2.00 | Closed |
| 34 | Command R+ | Cohere | — | — | 128K | $2.50 | Closed |
| 35 | Qwen 2.5 72B Instruct | Alibaba | — | 85.3 | 128K | $0.30 | Open |
| 36 | Claude Sonnet 4.6 | Anthropic | — | 89.3 | 200K | $3.00 | Closed |
Model Profiles
Claude Opus 4.6
Anthropic
Current #1 on Chatbot Arena (1496 ELO), with 99.8% AIME 2025 and 80.8% SWE-bench, leading in coding and hard prompts.
1496
ELO
1M
Context
$5.00
per 1M tokens
Gemini 3 Pro
Google's latest flagship with 94.3% GPQA and 100% AIME 2025, #2 on Chatbot Arena behind Claude Opus 4.6.
1486
ELO
1M
Context
per 1M tokens
Claude Opus 4.5
Anthropic
Major upgrade with 87% GPQA and 80.9% SWE-bench, the highest-rated Anthropic model before the 4.6 generation.
1467
ELO
200K
Context
$5.00
per 1M tokens
Gemini 2.5 Pro
Google's hybrid thinking model combining fast responses with deep reasoning, top performer on coding and math benchmarks.
1450
ELO
1M
Context
$1.25
per 1M tokens
Kimi K2.5
Moonshot AI
Chinese model with the highest HumanEval score ever recorded (99.0%), excelling at code generation and reasoning.
1449
ELO
262K
Context
per 1M tokens
GPT-4.5 Preview
OpenAI
OpenAI's largest and most knowledgeable non-reasoning model with broad world knowledge and reduced hallucinations.
1444
ELO
128K
Context
$75.00
per 1M tokens
GPT-4o
OpenAI
OpenAI's flagship multimodal model with native text, vision, and audio capabilities, offering strong all-around performance.
1442
ELO
128K
Context
$2.50
per 1M tokens
GPT-5.2
OpenAI
OpenAI's current-gen flagship with 400K context, 92.4% GPQA and 100% AIME 2025, strong reasoning at reduced cost.
1437
ELO
400K
Context
$1.75
per 1M tokens
o3
OpenAI
Advanced reasoning model succeeding o1, with significantly improved math and coding performance at reduced pricing.
1433
ELO
200K
Context
$10.00
per 1M tokens
DeepSeek R1
DeepSeek
Open-source reasoning model matching o1 performance with 97.3% MATH-500, disrupted the AI industry with its efficiency.
1418
ELO
64K
Context
$0.55
per 1M tokens
Claude Opus 4
Anthropic
Anthropic's first Opus 4 generation model with extended thinking capabilities and strong agentic coding performance.
1414
ELO
200K
Context
$5.00
per 1M tokens
Mistral Large 3
Mistral AI
Mistral's latest flagship competitive with frontier models, strong multilingual support and function calling.
1414
ELO
128K
Context
$2.00
per 1M tokens
Grok-3
xAI
Trained on xAI's Colossus supercluster, top-tier math reasoning with 93.3% AIME 2025 score.
1411
ELO
131K
Context
$3.00
per 1M tokens
Gemini 2.5 Flash
Fast reasoning model with excellent cost efficiency, balancing speed and intelligence for high-throughput applications.
1410
ELO
1M
Context
$0.30
per 1M tokens
Claude Haiku 4.5
Anthropic
Anthropic's fastest model in the 4.5 generation, offering near-Sonnet quality at Haiku-tier speed and pricing.
1404
ELO
200K
Context
$1.00
per 1M tokens
o1
OpenAI
OpenAI's first reasoning model that uses chain-of-thought to solve complex math, science, and coding problems.
1402
ELO
200K
Context
$15.00
per 1M tokens
GPT-4o Mini
OpenAI
Cost-efficient small model replacing GPT-3.5 Turbo, offering strong performance at a fraction of GPT-4o's cost.
—
ELO
128K
Context
$0.15
per 1M tokens
GPT-4 Turbo
OpenAI
Enhanced GPT-4 with vision support, JSON mode, and a 128K context window at reduced pricing versus original GPT-4.
—
ELO
128K
Context
$10.00
per 1M tokens
o3-mini
OpenAI
Cost-efficient reasoning model with adjustable effort levels (low/medium/high), matching o1 at medium on STEM tasks.
—
ELO
200K
Context
$1.10
per 1M tokens
Claude 3.5 Sonnet
Anthropic
Anthropic's breakout model that surpassed GPT-4o on most benchmarks at half the cost, excelling at coding and reasoning.
—
ELO
200K
Context
$3.00
per 1M tokens
Claude 3.5 Sonnet (Oct 2024)
Anthropic
Updated Sonnet with computer use capability and improved coding (93.7% HumanEval), the most popular coding model of late 2024.
—
ELO
200K
Context
$3.00
per 1M tokens
Claude 3 Opus
Anthropic
Anthropic's original flagship model, excelling at complex analysis and nuanced writing with strong safety alignment.
—
ELO
200K
Context
$15.00
per 1M tokens
Claude 3 Haiku
Anthropic
Anthropic's fastest and most affordable model, designed for near-instant responses on simple queries and classification.
—
ELO
200K
Context
$0.25
per 1M tokens
Claude Sonnet 4
Anthropic
Balanced mid-tier model in the Claude 4 generation, offering strong coding and reasoning at competitive pricing.
—
ELO
200K
Context
$3.00
per 1M tokens
Gemini 1.5 Pro
Google's first million-token context model (up to 2M), excelling at long-document understanding and multimodal tasks.
—
ELO
2M
Context
$1.25
per 1M tokens
Gemini 2.0 Flash
Ultra-fast and affordable multimodal model with native tool use, image/audio generation, and 1M token context.
—
ELO
1M
Context
$0.10
per 1M tokens
Llama 3.1 405B
Meta
Largest open-source model at release, competitive with GPT-4o and Claude 3.5 Sonnet across most benchmarks.
—
ELO
128K
Context
$0.80
per 1M tokens
Llama 3.1 70B
Meta
Strong mid-size open-source model offering excellent performance-to-cost ratio for self-hosted deployments.
—
ELO
128K
Context
$0.35
per 1M tokens
Llama 3.1 8B
Meta
Compact open-source model suitable for on-device and edge deployments with surprisingly strong capabilities for its size.
—
ELO
128K
Context
$0.05
per 1M tokens
Mistral Large 2
Mistral AI
Mistral's flagship with 123B parameters, multilingual in 80+ languages, and strong code generation (92% HumanEval).
—
ELO
128K
Context
$2.00
per 1M tokens
Mixtral 8x22B
Mistral AI
Sparse MoE model using 39B of 141B parameters, Apache 2.0 licensed, excellent efficiency-to-performance ratio.
—
ELO
64K
Context
$2.00
per 1M tokens
DeepSeek V3
DeepSeek
Chinese open-source MoE model rivaling GPT-4o at a fraction of the cost, trained for under $6M causing industry shock.
—
ELO
128K
Context
$0.14
per 1M tokens
Grok-2
xAI
xAI's second-gen model with real-time X/Twitter data access, competitive with GPT-4o on standard benchmarks.
—
ELO
128K
Context
$2.00
per 1M tokens
Command R+
Cohere
Enterprise-focused RAG-optimized model excelling at multi-step tool use, long document analysis, and multilingual tasks.
—
ELO
128K
Context
$2.50
per 1M tokens
Qwen 2.5 72B Instruct
Alibaba
Alibaba's leading open-source model with strong multilingual and coding capabilities, competitive with Llama 3.1 70B.
—
ELO
128K
Context
$0.30
per 1M tokens
Claude Sonnet 4.6
Anthropic
Latest Sonnet with adaptive reasoning, 89.9% GPQA and 79.6% SWE-bench, excellent balance of speed and intelligence.
—
ELO
200K
Context
$3.00
per 1M tokens