AI Model Comparison

Compare model families by provider, type, context size, benchmark score, and reference pricing.

Reference data snapshot: 2026-05-29

These figures are reference estimates from the tool data set. For live model pricing, use the main Crazyrouter pricing page before making production decisions.

Migrated local data from D:\crazyrouter-tools and image-nextjs planning records

Compare model capability

Filter by provider and model type, then sort by benchmark or cost.

Model	Provider	Type	Context	MMLU	HumanEval	Math	Input / 1M	Output / 1M
GPT-5.4	OpenAI	Flagship	270K	93.5	95.8	88.0	$2.5	$15
Gemini 3.1 Pro	Google	Flagship	2M	93.2	96.0	89.5	$1.25	$10
Claude Opus 4.6	Anthropic	Flagship	200K	93.0	96.5	89.0	$15	$75
Grok 4	xAI	Reasoning	256K	93.0	96.0	93.0	$5	$25
o3	OpenAI	Reasoning	200K	92.3	95.2	98.6	$10	$40
Claude Sonnet 4.6	Anthropic	Flagship	200K	91.5	95.5	86.5	$3	$15
DeepSeek-R1-0528	DeepSeek	Reasoning	128K	91.5	93.5	97.8	$0.55	$2.19
Qwen3-Max	Qwen	Flagship	128K	91.0	94.0	85.0	$0.4	$1.6
Gemini 2.5 Pro	Google	Reasoning	1M	90.8	94.0	86.5	$1.25	$10
GPT-4.1	OpenAI	Flagship	1.0M	90.2	93.5	80.0	$3	$12
Llama 4 Maverick	Meta	Open	256K	89.2	91.5	80.5	$0.5	$1.5
GPT-5 mini	OpenAI	Fast	270K	89.0	92.0	82.0	$0.25	$2
Gemini 3 Flash	Google	Fast	1M	89.0	92.5	83.0	$0.15	$0.6
GPT-4o	OpenAI	Flagship	128K	88.7	90.2	76.6	$2.5	$10
o4-mini	OpenAI	Reasoning	200K	88.5	93.4	98.2	$4	$16
DeepSeek-V3 0324	DeepSeek	Flagship	128K	88.5	91.0	78.5	$0.27	$1.1
GPT-4.1 mini	OpenAI	Fast	1.0M	87.5	91.0	76.0	$0.8	$3.2
Mistral Medium 3	Mistral	Flagship	128K	87.0	91.5	76.0	$2	$6
QwQ-32B	Qwen	Reasoning	128K	86.5	90.0	95.0	$0.15	$0.6
Claude Haiku 4.5	Anthropic	Fast	200K	86.0	90.0	75.0	$0.8	$4
GPT-4o mini	OpenAI	Fast	128K	82.0	87.0	70.2	$0.15	$0.6