Leaderboard
AI Model Rankings 2026
The most comprehensive AI model comparison. ELO ratings from Chatbot Arena, benchmark scores, pricing, and what each model is best at.
53 models — ranked by ELO, newest first
Ranked list shows models with public ELO data. New models without enough Arena votes are listed separately below.
| # | Model | Developer | ELO | MMLU | Context | Price (Input) | Model ID | Official | Type |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | Anthropic | 1512 | - | 1M | $5.00 | claude-opus-4.8 | Claude models docs -> | Closed |
| 2 | GPT-5.5 Pro | OpenAI | 1510 | - | 256K | $5.00 | gpt-5-5-pro | OpenAI models docs -> | Closed |
| 3 | GPT-5.5 | OpenAI | 1506 | - | 1M (API) / 400K (Codex) | $5.00 | gpt-5.5 | OpenAI models docs -> | Closed |
| 4 | Claude Opus 4.7 | Anthropic | 1505 | - | 1M | $5.00 | claude-opus-4.7 | Claude models docs -> | Closed |
| 5 | Gemini 3.1 Pro | 1505 | - | 1M | $2.00 | gemini-3.1-pro | Gemini models docs -> | Closed | |
| 6 | Grok 4.3 | xAI | 1498 | - | N/A | grok-4.3 | xAI API docs -> | Closed | |
| 7 | Grok 4.20 | xAI | 1496 | - | 256K | $3.00 | grok-4-20 | xAI API docs -> | Closed |
| 8 | GPT-5.4 | OpenAI | 1495 | - | 1M | $2.50 | gpt-5.4 | OpenAI models docs -> | Closed |
| 9 | Claude Opus 4.6 | Anthropic | 1490 | 91.1 | 1M | $5.00 | claude-opus-4.6 | Claude models docs -> | Closed |
| 10 | Gemini 3 Pro | 1486 | 91.8 | 1M | gemini-3-pro | Gemini models docs -> | Closed | ||
| 11 | Claude Opus 4.5 | Anthropic | 1467 | 90.8 | 200K | $5.00 | claude-opus-4.5 | Claude models docs -> | Closed |
| 12 | Claude Sonnet 4.6 | Anthropic | 1467 | 89.3 | 200K | $3.00 | claude-sonnet-4.6 | Claude models docs -> | Closed |
| 13 | DeepSeek V4 Pro | DeepSeek | 1467 | - | 128K | deepseek-v4-pro | DeepSeek API docs -> | Closed | |
| 14 | GLM 5.1 | Zhipu AI | 1467 | - | 128K | $1.00 | glm-5-1 | Find official docs -> | Closed |
| 15 | Kimi K2.6 | Moonshot AI | 1466 | - | 256K | $1.50 | kimi-k2-6 | Moonshot API docs -> | Closed |
| 16 | DeepSeek V3.2 | DeepSeek | 1455 | - | 128K | $0.27 | deepseek-v3-2 | DeepSeek API docs -> | Open |
| 17 | Kimi K2.5 | Moonshot AI | 1452 | - | 262K | kimi-k2.5 | Moonshot API docs -> | Closed | |
| 18 | Gemini 2.5 Pro | 1450 | - | 1M | $1.25 | gemini-2.5-pro | Gemini models docs -> | Closed | |
| 19 | GLM 5 | Zhipu AI | 1450 | - | 128K | $0.80 | glm-5 | Find official docs -> | Closed |
| 20 | GPT-4.5 Preview | OpenAI | 1444 | - | 128K | $75.00 | gpt-4.5-preview | OpenAI models docs -> | Closed |
| 21 | GPT-4o | OpenAI | 1442 | 88.7 | 128K | $2.50 | gpt-4o | OpenAI models docs -> | Closed |
| 22 | GPT-5.2 | OpenAI | 1437 | 89.6 | 400K | $1.75 | gpt-5.2 | OpenAI models docs -> | Closed |
| 23 | o3 | OpenAI | 1433 | - | 200K | $10.00 | o3 | OpenAI models docs -> | Closed |
| 24 | Gemini 3.1 Flash-Lite | 1421 | - | 1M | $0.10 | gemini-3-1-flash-lite | Gemini models docs -> | Closed | |
| 25 | DeepSeek R1 | DeepSeek | 1418 | 90.8 | 64K | $0.55 | deepseek-r1 | DeepSeek API docs -> | Open |
| 26 | Claude Opus 4 | Anthropic | 1414 | - | 200K | $5.00 | claude-opus-4 | Claude models docs -> | Closed |
| 27 | Mistral Large 3 | Mistral AI | 1414 | - | 128K | $2.00 | mistral-large-3 | Mistral models docs -> | Open |
| 28 | Grok-3 | xAI | 1411 | 92.7 | 131K | $3.00 | grok-3 | xAI API docs -> | Closed |
| 29 | Gemini 2.5 Flash | 1410 | - | 1M | $0.30 | gemini-2.5-flash | Gemini models docs -> | Closed | |
| 30 | DeepSeek V4 Flash | DeepSeek | 1410 | - | 128K | deepseek-v4-flash | DeepSeek API docs -> | Closed | |
| 31 | Claude Haiku 4.5 | Anthropic | 1404 | - | 200K | $1.00 | claude-haiku-4.5 | Claude models docs -> | Closed |
| 32 | o1 | OpenAI | 1402 | 90.8 | 200K | $15.00 | o1 | OpenAI models docs -> | Closed |
New Models (Awaiting Public ELO)
Model Profiles
Claude Opus 4.8
Anthropic
Anthropic's latest flagship model (May 28, 2026). 4x less likely to overlook code flaws than Opus 4.7. First model to complete every case on Super-Agent benchmark. Highest score ever on Legal Agent Benchmark. New dynamic workflows enable parallel subagents. Fast mode available at $10/$50 per million tokens.
1512
ELO
1M
Context
$5.00
per 1M tokens
GPT-5.5 Pro
OpenAI
OpenAI's most capable model as of May 2026. Leads the Arena leaderboard. 52.5% fewer hallucinations than GPT-5.4. Enhanced personalization with conversation memory and Gmail integration.
1510
ELO
256K
Context
$5.00
per 1M tokens
GPT-5.5
OpenAI
OpenAI's latest flagship model for real-world coding and professional workflows, with stronger agentic performance and a 1M API context window.
1506
ELO
1M (API) / 400K (Codex)
Context
$5.00
per 1M tokens
Claude Opus 4.7
Anthropic
Anthropic's most capable generally available model, launched in April 2026 with stronger long-horizon agentic performance and the same $5/$25 MTok pricing as Opus 4.6.
1505
ELO
1M
Context
$5.00
per 1M tokens
Gemini 3.1 Pro
Google's newer Pro generation surfaced at Cloud Next 2026 as its most capable model for complex workflows, with 1M context and Gemini 3-class multimodal support.
1505
ELO
1M
Context
$2.00
per 1M tokens
Grok 4.3
xAI
xAI's newer flagship family with top-ranked LMArena results for both thinking and non-thinking modes.
1498
ELO
N/A
Context
per 1M tokens
Grok 4.20
xAI
xAI's latest model with improved reasoning and coding capabilities.
1496
ELO
256K
Context
$3.00
per 1M tokens
GPT-5.4
OpenAI
OpenAI's March 2026 frontier model for professional reasoning and agentic coding, released across ChatGPT, API, and Codex.
1495
ELO
1M
Context
$2.50
per 1M tokens
Claude Opus 4.6
Anthropic
Current #1 on Chatbot Arena (1496 ELO), with 99.8% AIME 2025 and 80.8% SWE-bench, leading in coding and hard prompts.
1490
ELO
1M
Context
$5.00
per 1M tokens
Gemini 3 Pro
Google's latest flagship with 94.3% GPQA and 100% AIME 2025, #2 on Chatbot Arena behind Claude Opus 4.6.
1486
ELO
1M
Context
per 1M tokens
Claude Opus 4.5
Anthropic
Major upgrade with 87% GPQA and 80.9% SWE-bench, the highest-rated Anthropic model before the 4.6 generation.
1467
ELO
200K
Context
$5.00
per 1M tokens
Claude Sonnet 4.6
Anthropic
Latest Sonnet with adaptive reasoning, 89.9% GPQA and 79.6% SWE-bench, excellent balance of speed and intelligence.
1467
ELO
200K
Context
$3.00
per 1M tokens
DeepSeek V4 Pro
DeepSeek
DeepSeek's higher-capability V4-generation API model introduced in April 2026 for deeper reasoning and agentic workloads.
1467
ELO
128K
Context
per 1M tokens
GLM 5.1
Zhipu AI
Chinese frontier model from Zhipu AI. Competitive with Claude Sonnet at a fraction of the cost.
1467
ELO
128K
Context
$1.00
per 1M tokens
Kimi K2.6
Moonshot AI
Moonshot AI's latest model with strong multilingual and long-context capabilities. $20B valuation.
1466
ELO
256K
Context
$1.50
per 1M tokens
DeepSeek V3.2
DeepSeek
Updated open-source model with near-frontier capability at 1/20th the cost of GPT-5.5.
1455
ELO
128K
Context
$0.27
per 1M tokens
Kimi K2.5
Moonshot AI
Chinese model with the highest HumanEval score ever recorded (99.0%), excelling at code generation and reasoning.
1452
ELO
262K
Context
per 1M tokens
Gemini 2.5 Pro
Google's hybrid thinking model combining fast responses with deep reasoning, top performer on coding and math benchmarks.
1450
ELO
1M
Context
$1.25
per 1M tokens
GLM 5
Zhipu AI
Previous generation Zhipu model, still competitive with Western mid-tier models.
1450
ELO
128K
Context
$0.80
per 1M tokens
GPT-4.5 Preview
OpenAI
OpenAI's largest and most knowledgeable non-reasoning model with broad world knowledge and reduced hallucinations.
1444
ELO
128K
Context
$75.00
per 1M tokens
GPT-4o
OpenAI
OpenAI's flagship multimodal model with native text, vision, and audio capabilities, offering strong all-around performance.
1442
ELO
128K
Context
$2.50
per 1M tokens
GPT-5.2
OpenAI
OpenAI's current-gen flagship with 400K context, 92.4% GPQA and 100% AIME 2025, strong reasoning at reduced cost.
1437
ELO
400K
Context
$1.75
per 1M tokens
o3
OpenAI
Advanced reasoning model succeeding o1, with significantly improved math and coding performance at reduced pricing.
1433
ELO
200K
Context
$10.00
per 1M tokens
Gemini 3.1 Flash-Lite
Ultra-low latency model designed for sub-second responses at scale. Google's cheapest frontier-adjacent model.
1421
ELO
1M
Context
$0.10
per 1M tokens
DeepSeek R1
DeepSeek
Open-source reasoning model matching o1 performance with 97.3% MATH-500, disrupted the AI industry with its efficiency.
1418
ELO
64K
Context
$0.55
per 1M tokens
Claude Opus 4
Anthropic
Anthropic's first Opus 4 generation model with extended thinking capabilities and strong agentic coding performance.
1414
ELO
200K
Context
$5.00
per 1M tokens
Mistral Large 3
Mistral AI
Open-weight Apache 2.0 MoE flagship from Mistral 3 generation with strong multilingual and multimodal performance.
1414
ELO
128K
Context
$2.00
per 1M tokens
Grok-3
xAI
Trained on xAI's Colossus supercluster, top-tier math reasoning with 93.3% AIME 2025 score.
1411
ELO
131K
Context
$3.00
per 1M tokens
Gemini 2.5 Flash
Fast reasoning model with excellent cost efficiency, balancing speed and intelligence for high-throughput applications.
1410
ELO
1M
Context
$0.30
per 1M tokens
DeepSeek V4 Flash
DeepSeek
DeepSeek's April 2026 V4-generation API model focused on speed and lower-cost production inference.
1410
ELO
128K
Context
per 1M tokens
Claude Haiku 4.5
Anthropic
Anthropic's fastest model in the 4.5 generation, offering near-Sonnet quality at Haiku-tier speed and pricing.
1404
ELO
200K
Context
$1.00
per 1M tokens
o1
OpenAI
OpenAI's first reasoning model that uses chain-of-thought to solve complex math, science, and coding problems.
1402
ELO
200K
Context
$15.00
per 1M tokens
Claude Fable 5
Anthropic
Anthropic's first publicly available Mythos-class model (June 9, 2026) and its most powerful general-release model ever. Leads agentic coding at 80.3% on SWE-bench Pro (vs 69.2% for Opus 4.8), with a 1M-token context window and 128K max output. Built for long-horizon autonomous work — Stripe used it to compress a months-long codebase migration into a single day. High-risk queries (e.g. cybersecurity, biology) are routed to Opus 4.8; safeguards trigger in under 5% of sessions. Priced at $10/$50 per MTok, with up to 90% input savings via prompt caching.
-
ELO
1M
Context
$10.00
per 1M tokens
GPT-5.5 Instant
OpenAI
Fast default model powering ChatGPT for all users. Optimized for speed while maintaining GPT-5.5 quality.
-
ELO
128K
Context
$2.50
per 1M tokens
GPT-5.4 Mini
OpenAI
Faster and lower-cost GPT-5.4 variant for high-volume coding, subagents, and computer-use workloads.
-
ELO
400K
Context
$0.75
per 1M tokens
GPT-5.4 Nano
OpenAI
OpenAI's smallest GPT-5.4-class model, optimized for ultra-cheap classification, extraction, and lightweight agent sub-tasks.
-
ELO
400K
Context
$0.20
per 1M tokens
Claude Sonnet 4
Anthropic
Balanced mid-tier model in the Claude 4 generation, offering strong coding and reasoning at competitive pricing.
-
ELO
200K
Context
$3.00
per 1M tokens
Llama 4 Scout
Meta
Open-weight, natively multimodal Llama 4 model designed for efficient deployment and long-context workloads.
-
ELO
N/A
Context
per 1M tokens
Llama 4 Maverick
Meta
Higher-capability open-weight Llama 4 model in Meta's multimodal MoE generation, available for download via llama.com.
-
ELO
N/A
Context
per 1M tokens
Gemini 2.0 Flash
Ultra-fast and affordable multimodal model with native tool use, image/audio generation, and 1M token context.
-
ELO
1M
Context
$0.10
per 1M tokens
o3-mini
OpenAI
Cost-efficient reasoning model with adjustable effort levels (low/medium/high), matching o1 at medium on STEM tasks.
-
ELO
200K
Context
$1.10
per 1M tokens
DeepSeek V3
DeepSeek
Chinese open-source MoE model rivaling GPT-4o at a fraction of the cost, trained for under $6M causing industry shock.
-
ELO
128K
Context
$0.14
per 1M tokens
Claude 3.5 Sonnet (Oct 2024)
Anthropic
Updated Sonnet with computer use capability and improved coding (93.7% HumanEval), the most popular coding model of late 2024.
-
ELO
200K
Context
$3.00
per 1M tokens
Qwen 2.5 72B Instruct
Alibaba
Alibaba's leading open-source model with strong multilingual and coding capabilities, competitive with Llama 3.1 70B.
-
ELO
128K
Context
$0.30
per 1M tokens
Grok-2
xAI
xAI's second-gen model with real-time X/Twitter data access, competitive with GPT-4o on standard benchmarks.
-
ELO
128K
Context
$2.00
per 1M tokens
GPT-4o Mini
OpenAI
Cost-efficient small model replacing GPT-3.5 Turbo, offering strong performance at a fraction of GPT-4o's cost.
-
ELO
128K
Context
$0.15
per 1M tokens
Llama 3.1 405B
Meta
Largest open-source model at release, competitive with GPT-4o and Claude 3.5 Sonnet across most benchmarks.
-
ELO
128K
Context
$0.80
per 1M tokens
Llama 3.1 70B
Meta
Strong mid-size open-source model offering excellent performance-to-cost ratio for self-hosted deployments.
-
ELO
128K
Context
$0.35
per 1M tokens
Llama 3.1 8B
Meta
Compact open-source model suitable for on-device and edge deployments with surprisingly strong capabilities for its size.
-
ELO
128K
Context
$0.05
per 1M tokens
Mistral Large 2
Mistral AI
Mistral's flagship with 123B parameters, multilingual in 80+ languages, and strong code generation (92% HumanEval).
-
ELO
128K
Context
$2.00
per 1M tokens
Gemini 1.5 Pro
Google's first million-token context model (up to 2M), excelling at long-document understanding and multimodal tasks.
-
ELO
2M
Context
$1.25
per 1M tokens
Claude 3 Opus
Anthropic
Anthropic's original flagship model, excelling at complex analysis and nuanced writing with strong safety alignment.
-
ELO
200K
Context
$15.00
per 1M tokens
Claude 3 Haiku
Anthropic
Anthropic's fastest and most affordable model, designed for near-instant responses on simple queries and classification.
-
ELO
200K
Context
$0.25
per 1M tokens