AI Model Comparison
Compare model families by provider, type, context size, benchmark score, and reference pricing.
Reference data snapshot: 2026-05-29
These figures are reference estimates from the tool data set. For live model pricing, use the main Crazyrouter pricing page before making production decisions.
Migrated local data from D:\crazyrouter-tools and image-nextjs planning records
Compare model capability
Filter by provider and model type, then sort by benchmark or cost.
| Model | Provider | Type | Context | MMLU | HumanEval | Math | Input / 1M | Output / 1M | Vision | Tools |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5.4 | OpenAI | Flagship | 270K | 93.5 | 95.8 | 88.0 | $2.5 | $15 | ||
| Gemini 3.1 Pro | Flagship | 2M | 93.2 | 96.0 | 89.5 | $1.25 | $10 | |||
| Claude Opus 4.6 | Anthropic | Flagship | 200K | 93.0 | 96.5 | 89.0 | $15 | $75 | ||
| Grok 4 | xAI | Reasoning | 256K | 93.0 | 96.0 | 93.0 | $5 | $25 | ||
| o3 | OpenAI | Reasoning | 200K | 92.3 | 95.2 | 98.6 | $10 | $40 | ||
| Claude Sonnet 4.6 | Anthropic | Flagship | 200K | 91.5 | 95.5 | 86.5 | $3 | $15 | ||
| DeepSeek-R1-0528 | DeepSeek | Reasoning | 128K | 91.5 | 93.5 | 97.8 | $0.55 | $2.19 | ||
| Qwen3-Max | Qwen | Flagship | 128K | 91.0 | 94.0 | 85.0 | $0.4 | $1.6 | ||
| Gemini 2.5 Pro | Reasoning | 1M | 90.8 | 94.0 | 86.5 | $1.25 | $10 | |||
| GPT-4.1 | OpenAI | Flagship | 1.0M | 90.2 | 93.5 | 80.0 | $3 | $12 | ||
| Llama 4 Maverick | Meta | Open | 256K | 89.2 | 91.5 | 80.5 | $0.5 | $1.5 | ||
| GPT-5 mini | OpenAI | Fast | 270K | 89.0 | 92.0 | 82.0 | $0.25 | $2 | ||
| Gemini 3 Flash | Fast | 1M | 89.0 | 92.5 | 83.0 | $0.15 | $0.6 | |||
| GPT-4o | OpenAI | Flagship | 128K | 88.7 | 90.2 | 76.6 | $2.5 | $10 | ||
| o4-mini | OpenAI | Reasoning | 200K | 88.5 | 93.4 | 98.2 | $4 | $16 | ||
| DeepSeek-V3 0324 | DeepSeek | Flagship | 128K | 88.5 | 91.0 | 78.5 | $0.27 | $1.1 | ||
| GPT-4.1 mini | OpenAI | Fast | 1.0M | 87.5 | 91.0 | 76.0 | $0.8 | $3.2 | ||
| Mistral Medium 3 | Mistral | Flagship | 128K | 87.0 | 91.5 | 76.0 | $2 | $6 | ||
| QwQ-32B | Qwen | Reasoning | 128K | 86.5 | 90.0 | 95.0 | $0.15 | $0.6 | ||
| Claude Haiku 4.5 | Anthropic | Fast | 200K | 86.0 | 90.0 | 75.0 | $0.8 | $4 | ||
| GPT-4o mini | OpenAI | Fast | 128K | 82.0 | 87.0 | 70.2 | $0.15 | $0.6 |