From votes to a leaderboard

Thank you for your contributions!
The arena ranking is based on all votes and reactions from the blind comparison of the models, collected since the service opened to the public in October 2024.
Developed in partnership with the Digital Regulation Expertise Center (PEReN), the model ranking is based on the satisfaction score calculated using the Bradley Terry statistical model, a widely used method for converting binary votes into a probabilistic ranking.
The compar:IA ranking is not intended to be an official recommendation or to evaluate the technical performance of the models. It reflects the subjective preferences of the platform's users and not the factual accuracy or veracity of the responses.

From votes to a leaderboard
Rank Tooltip	Model	BT score of satisfaction Tooltip	Confidence (±) Tooltip	Total votes	Average energy (per 1000 tokens) Tooltip	Size (parameters) Tooltip	Architecture Tooltip	Release date	Organization	Licence
1	gemini-3-flash-preview	1153	-7/+1	3320	N / A	XL - (estimate)	Proprietary	12/25	Google	Proprietary
2	mistral-medium-2508	1146	-7/+2	5604	N / A	L - (estimate)	Proprietary	8/25	Mistral AI	Proprietary
3	mistral-large-2512	1137	-11/+3	4368	4134 mWh	XL - 675 billion	MoE	12/25	Mistral AI	Open-weight
4	gemini-2.5-flash	1134	-12/+4	3081	N / A	XL - (estimate)	Proprietary	6/25	Google	Proprietary
5	gemini-3.1-flash-lite-preview	1131	-18/+5	2221	N / A	L - (estimate)	Proprietary	3/26	Google	Proprietary
6	claude-4-6-sonnet	1130	-17/+6	4267	N / A	XL - (estimate)	Proprietary	2/26	Anthropic	Proprietary
7	qwen3-max-2025-09-23	1119	-19/+5	3039	N / A	XL - (estimate)	Proprietary	9/25	Alibaba	Proprietary
8	glm-4.5	1119	-20/+8	588	1892 mWh	L - 355 billion	MoE	7/25	Zhipu	Open-weight
9	gemini-2.0-flash	1119	-17/+7	6362	N / A	XL - (estimate)	Proprietary	12/24	Google	Proprietary
10	gemini-3-pro-preview	1114	-18/+10	868	N / A	XL - (estimate)	Proprietary	11/25	Google	Proprietary
11	gemma-4-26b-a4b-it	1114	-17/+9	1532	84 mWh	S - 26 billion	MoE	4/26	Google	Open-weight
12	trinity-large-thinking	1109	-26/+11	499	1524 mWh	L - 398 billion	MoE	4/26	Arcee	Open-weight
13	gemma-4-31b-it	1107	-15/+10	1530	117 mWh	S - 31 billion	Dense	4/26	Google	Open-weight
14	qwen3-coder-next	1106	-15/+12	1252	332 mWh	M - 80 billion	MoE	2/26	Alibaba	Open-weight
15	gemini-3.1-pro-preview	1105	-13/+11	2799	N / A	XL - (estimate)	Proprietary	2/26	Google	Proprietary
16	deepseek-v3-0324	1105	-12/+12	3622	3979 mWh	XL - 685 billion	MoE	3/25	DeepSeek	Open-weight
17	mistral-small-2603	1104	-11/+13	2004	347 mWh	L - 119 billion	MoE	3/26	Mistral AI	Open-weight
18	gpt-5.4	1103	-10/+14	2551	N / A	L - (estimate)	Proprietary	3/26	OpenAI	Proprietary
19	gpt-5.1	1103	-17/+17	766	N / A	L - (estimate)	Proprietary	11/25	OpenAI	Proprietary
20	magistral-medium	1103	-16/+17	1195	N / A	L - (estimate)	Proprietary	6/25	Mistral AI	Proprietary
21	gpt-5.2	1099	-15/+17	1120	N / A	L - (estimate)	Proprietary	12/25	OpenAI	Proprietary
22	gemini-3.5-flash	1095	-19/+18	839	N / A	XL - (estimate)	Proprietary	5/26	Google	Proprietary
23	deepseek-v3-chat	1094	-13/+17	3783	3979 mWh	XL - 671 billion	MoE	12/24	DeepSeek	Open-weight
24	gpt-5.5	1094	-14/+18	1424	N / A	L - (estimate)	Proprietary	4/26	OpenAI	Proprietary
25	deepseek-chat-v3.1	1093	-13/+19	1163	3979 mWh	XL - 685 billion	MoE	8/25	DeepSeek	Open-weight
26	gemma-3-27b	1091	-9/+18	6466	112 mWh	S - 27 billion	Dense	3/25	Google	Open-weight
27	glm-4.6	1090	-20/+23	487	1892 mWh	L - 357 billion	MoE	9/25	Zhipu	Open-weight
28	kimi-k2.6	1078	-19/+20	952	3785 mWh	XL - 1000 billion	MoE	4/26	Moonshot AI	Open-weight
29	claude-4-5-sonnet	1076	-14/+13	4677	N / A	XL - (estimate)	Proprietary	9/25	Anthropic	Proprietary
30	gpt-5.4-mini	1069	-19/+13	2142	N / A	S - (estimate)	Proprietary	3/26	OpenAI	Proprietary
31	deepseek-r1-0528	1068	-18/+14	1767	3979 mWh	XL - 685 billion	MoE	5/25	DeepSeek	Open-weight
32	gpt-5.4-nano	1066	-18/+14	1993	N / A	XS - (estimate)	Proprietary	3/26	OpenAI	Proprietary
33	kimi-k2.5	1065	-17/+16	2002	3785 mWh	XL - 1000 billion	MoE	1/26	Moonshot AI	Open-weight
34	gemma-3-12b	1063	-15/+8	6318	94 mWh	XS - 12 billion	Dense	3/25	Google	Open-weight
35	qwen3.6-plus	1061	-18/+18	1128	N / A	XL - (estimate)	Proprietary	4/26	Alibaba	Proprietary
36	kimi-k2-thinking	1059	-19/+19	969	3785 mWh	XL - 1000 billion	MoE	11/25	Moonshot AI	Open-weight
37	mistral-small-2506	1059	-14/+11	4194	109 mWh	S - 24 billion	Dense	6/25	Mistral AI	Open-weight
38	deepseek-v4-pro	1058	-16/+15	1160	8890 mWh	XL - 1600 billion	MoE	4/26	DeepSeek	Open-weight
39	deepseek-v4-flash	1058	-15/+16	1150	1524 mWh	L - 284 billion	MoE	4/26	DeepSeek	Open-weight
40	DeepSeek-V3.2	1058	-11/+14	2472	3979 mWh	XL - 685 billion	MoE	12/25	DeepSeek	Open-weight
41	Qwen3-Coder-480B-A35B-Instruct	1055	-21/+24	633	1951 mWh	XL - 480 billion	MoE	7/25	Alibaba	Open-weight
42	claude-4-sonnet	1054	-11/+15	2238	N / A	XL - (estimate)	Proprietary	5/25	Anthropic	Proprietary
43	command-a	1053	-9/+15	4824	857 mWh	L - 111 billion	Dense	3/25	Cohere	Open-weight
44	kimi-k2	1052	-13/+17	1782	3785 mWh	XL - 1000 billion	MoE	9/25	Moonshot AI	Open-weight
45	gpt-5.3	1046	-17/+17	1454	N / A	L - (estimate)	Proprietary	3/26	OpenAI	Proprietary
46	claude-3-7-sonnet	1045	-13/+18	2736	N / A	XL - (estimate)	Proprietary	2/25	Anthropic	Proprietary
47	magistral-small-2506	1040	-15/+17	2319	109 mWh	S - 24 billion	Dense	6/25	Mistral AI	Open-weight
48	llama-3.1-nemotron-70b-instruct	1036	-14/+13	4920	658 mWh	M - 70 billion	Dense	10/24	Nvidia	Open-weight
49	qwen3.5-397b-a17b	1035	-16/+16	1988	1601 mWh	L - 397 billion	MoE	2/26	Alibaba	Open-weight
50	qwen3-32b	1030	-27/+22	456	118 mWh	S - 32 billion	Dense	4/25	Alibaba	Open-weight
51	gemma-3-4b	1030	-14/+11	7137	84 mWh	XS - 4 billion	Dense	3/25	Google	Open-weight
52	deepseek-r1	1023	-20/+10	2315	3979 mWh	XL - 671 billion	MoE	1/25	DeepSeek	Open-weight
53	glm-5.1	1022	-23/+16	987	4095 mWh	XL - 744 billion	MoE	4/26	Zhipu	Open-weight
54	gpt-4.1-mini	1022	-17/+11	4968	N / A	M - (estimate)	Proprietary	4/25	OpenAI	Proprietary
55	gemma-3n-e4b-it	1020	-17/+10	4314	84 mWh	XS - 8 billion	Matformer	5/25	Google	Open-weight
56	qwen3.7-max	1020	-26/+26	373	N / A	XL - (estimate)	Proprietary	5/26	Alibaba	Proprietary
57	gpt-oss-120b	1019	-16/+12	3148	342 mWh	L - 117 billion	MoE	8/25	OpenAI	Open-weight
58	minimax-m2	1016	-21/+20	923	733 mWh	L - 230 billion	MoE	10/25	MiniMax	Open-weight
59	glm-5	1016	-17/+16	1512	4095 mWh	XL - 744 billion	MoE	2/26	Zhipu	Open-weight
60	minimax-m2.5	1013	-17/+16	1600	733 mWh	L - 229 billion	MoE	2/26	MiniMax	Open-weight
61	gpt-5-mini	1013	-16/+16	1837	N / A	S - (estimate)	Proprietary	8/25	OpenAI	Proprietary
62	lfm2-24b-a2b	1013	-16/+18	1412	82 mWh	S - 24 billion	MoE	2/26	Liquid	Open-weight
63	llama-maverick	1008	-14/+12	4204	1601 mWh	XL - 400 billion	MoE	4/25	Meta	Open-weight
64	gemini-1.5-pro	1007	-13/+13	5761	N / A	XL - (estimate)	Proprietary	9/24	Google	Proprietary
65	glm-4.7	1005	-18/+16	1240	1892 mWh	L - 357 billion	MoE	12/25	Zhipu	Open-weight
66	nemotron-3-super-120b-a12b	1004	-17/+17	1428	376 mWh	L - 120 billion	MoE	3/26	Nvidia	Open-weight
67	qwen3.5-35b-a3b	1002	-16/+16	2160	166 mWh	S - 35 billion	MoE	2/26	Alibaba	Open-weight
68	minimax-m2.7	1001	-15/+20	1069	733 mWh	L - 230 billion	MoE	3/26	MiniMax	Open-weight
69	EuroLLM-22B-Instruct-2512	994	-14/+18	1873	106 mWh	S - 22 billion	Dense	12/25	EuroLLM	Open-weight
70	mistral-saba	993	-12/+17	3257	N / A	S - (estimate)	Proprietary	2/25	Mistral AI	Proprietary
71	llama-4-scout	992	-11/+17	6454	400 mWh	L - 109 billion	MoE	4/25	Meta	Open-weight
72	gpt-5	992	-12/+22	1358	N / A	L - (estimate)	Proprietary	8/25	OpenAI	Proprietary
73	mistral-small-3.1-24b	983	-14/+12	3423	109 mWh	S - 24 billion	Dense	3/25	Mistral AI	Open-weight
74	granite-4.1-8b	980	-20/+24	513	89 mWh	XS - 8 billion	Dense	4/26	IBM	Open-weight
75	o4-mini	980	-15/+21	1542	N / A	S - (estimate)	Proprietary	4/25	OpenAI	Proprietary
76	lfm2-8b-a1b	976	-14/+14	1628	81 mWh	XS - 8 billion	MoE	10/25	Liquid	Open-weight
77	trinity-large-preview	970	-16/+13	1488	1524 mWh	L - 398 billion	MoE	1/26	Arcee	Open-weight
78	gpt-oss-20b	967	-13/+8	3899	83 mWh	S - 21 billion	MoE	8/25	OpenAI	Open-weight
79	qwen3-30b-a3b	967	-16/+15	1058	83 mWh	S - 30 billion	MoE	5/25	Alibaba	Open-weight
80	Apertus-70B-Instruct-2509	962	-13/+8	3245	658 mWh	M - 70 billion	Dense	9/25	Swiss AI	Open source
81	aya-expanse-32b	960	-13/+8	3479	118 mWh	S - 32 billion	Dense	12/24	Cohere	Open-weight
82	llama-3.3-70b	959	-11/+8	7402	658 mWh	M - 70 billion	Dense	12/24	Meta	Open-weight
83	gemma-2-27b-it-q8	957	-13/+18	679	112 mWh	S - 27 billion	Dense	6/24	Google	Open-weight
84	mistral-small-24b-instruct-2501	955	-11/+11	2272	109 mWh	S - 24 billion	Dense	1/25	Mistral AI	Open-weight
85	o3-mini	952	-10/+13	1142	N / A	S - (estimate)	Proprietary	11/24	OpenAI	Proprietary
86	gpt-4o-mini-2024-07-18	950	-8/+13	4940	N / A	S - (estimate)	Proprietary	7/24	OpenAI	Proprietary
87	gpt-4.1-nano	941	-8/+11	3959	N / A	S - (estimate)	Proprietary	4/25	OpenAI	Proprietary
88	gpt-5-nano	937	-13/+13	1347	N / A	XS - (estimate)	Proprietary	4/25	OpenAI	Proprietary
89	claude-3-5-sonnet-v2	935	-10/+10	4209	N / A	XL - (estimate)	Proprietary	10/24	Anthropic	Proprietary
90	llama-3.1-70b	929	-10/+10	4407	658 mWh	M - 70 billion	Dense	7/24	Meta	Open-weight
91	aya-expanse-8b	929	-10/+16	905	89 mWh	XS - 8 billion	Dense	10/24	Cohere	Open-weight
92	phi-4	921	-9/+6	6664	96 mWh	XS - 14 billion	Dense	12/24	Microsoft	Open-weight
93	gpt-4o-2024-08-06	921	-8/+10	3958	N / A	XL - (estimate)	Proprietary	8/24	OpenAI	Proprietary
94	llama-3.1-405b	915	-6/+9	7937	9134 mWh	XL - 405 billion	Dense	7/24	Meta	Open-weight
95	gemma-2-9b-it	905	-7/+9	3994	90 mWh	XS - 9 billion	Dense	6/24	Google	Open-weight
96	qwq-32b	905	-8/+11	985	118 mWh	S - 32 billion	Dense	4/25	Alibaba	Open-weight
97	qwen2.5-32b-instruct	904	-9/+26	142	118 mWh	S - 32 billion	Dense	9/24	Alibaba	Open-weight
98	hermes-4-70b	899	-5/+10	2816	658 mWh	M - 70 billion	Dense	8/25	Nous	Open-weight
99	deepseek-r1-distill-llama-70b	898	-6/+13	1693	658 mWh	M - 70 billion	Dense	1/25	DeepSeek	Open-weight
100	hermes-3-llama-3.1-405b	891	-5/+6	4818	9134 mWh	XL - 405 billion	Dense	7/24	Nous	Open-weight
101	qwen2.5-7b-instruct	873	-4/+8	1315	88 mWh	XS - 7 billion	Dense	9/24	Alibaba	Open-weight
102	llama-3.1-8b	872	-3/+6	7502	89 mWh	XS - 8 billion	Dense	7/24	Meta	Open-weight
103	olmo-3-32b-think	862	-1/+11	752	118 mWh	S - 32 billion	Dense	11/25	Ai2	Open source
104	mixtral-8x7b-instruct-v0.1	824	-2/+2	2297	193 mWh	S - 56 billion	MoE	12/23	Mistral AI	Open-weight
105	lfm-40b	807	-2/+2	2789	N / A	S - (estimate)	Proprietary	9/24	Liquid	Proprietary
106	phi-3.5-mini-instruct	800	-2/+3	2383	83 mWh	XS - 3.8 billion	Dense	8/24	Microsoft	Open-weight
107	mistral-nemo-2407	773	-2/+2	5046	94 mWh	XS - 12 billion	Dense	7/24	Mistral AI	Open-weight
108	mixtral-8x22b-instruct-v0.1	758	-1/+2	4355	1063 mWh	L - 176 billion	MoE	4/24	Mistral AI	Open-weight
109	chocolatine-2-14b-instruct-v2.0.3-q8	717	-3/+0	1386	88 mWh	XS - 14 billion	Dense	2/25	jpacifico	Open-weight
110	Yi-1.5-9B-Chat	709	-2/+6	65	90 mWh	XS - 9 billion	Dense	5/24	01-ai	Open-weight
111	chocolatine-14b-instruct-dpo-v1.2-q4	695	-1/+2	308	96 mWh	XS - 14 billion	Dense	9/24	jpacifico	Open-weight
112	qwen2-7b-instruct	673	-0/+3	80	88 mWh	XS - 7 billion	Dense	7/24	Alibaba	Open-weight

Are the most popular models energy efficient?

This graph represents the satisfaction score (Bradley Terry score) for each model as a function of the estimated average energy consumption per 1000 tokens. Energy consumption is estimated using the Ecologits methodology and is based on two parameters: the size of the models (number of parameters) and their architecture. For proprietary models, this information is either not provided or only partially available. Therefore, they are excluded from the graph below.

How to find the right balance between perceived performance and energy efficiency? Examples of how to read the graph

Higher a model is located on the graph the higher its Bradley-Terry satisfaction score. further to the left a model is located on the graph the less energy it consumes compared to other models.
At the top left are the models that are popular and consume less energy compared to other models.
Beyond size, architecture has an impact on the average energy consumption of models: for example, with a similar size, the Llama 3 405B model (dense architecture, 405 billion parameters) consumes 10 times more energy on average than the GLM 4.5 model (MOE architecture, 355 billion parameters and 32 billion active parameters).

Why are the proprietary models not displayed on the graph?

The estimation of energy consumption for model inference relies on the Ecologits methodology, which takes into account the size and architecture of the models. However, this information is not made public by model developers for proprietary models.

We have therefore decided not to integrate proprietary models into the graph until the information contributing used for the calculation of energy consumption is transparent.

How is the energy impact of the models calculated?

The arena uses the methodology developed by Ecologits (GenAI Impact) to provide an estimate of the energy footprint associated with inferring conversational generative AI models. This estimate allows users to compare the environmental impact of different AI models for the same query. This transparency is essential to encourage the development and adoption of more eco-friendly AI models.

Ecologits applies the principles of life cycle assessment (LCA) in accordance with ISO 14044, focusing for the moment on the impact of inference (i.e., the use of models to answer queries) and the manufacturing of graphics cards (resource extraction, manufacturing and transport).

The model's power consumption is estimated by taking into account various parameters such as the size and architecture of the AI model used, the location of the servers where the models are deployed, and the number of output tokens. The calculation of the global warming potential indicator, expressed in CO2 equivalent, is derived from the measurement of the model's power consumption.

It is important to note that methodologies for assessing the environmental impact of AI are still under development.

Chart data in table form

From votes to a leaderboard
Model	BT score of satisfaction Tooltip	Average energy (per 1000 tokens) Tooltip	Size (parameters) Tooltip	Architecture Tooltip	Organization	Licence
lfm2-8b-a1b	976	81 mWh	XS - 8 billion	MoE	Liquid	Open-weight
lfm2-24b-a2b	1013	82 mWh	S - 24 billion	MoE	Liquid	Open-weight
gpt-oss-20b	967	83 mWh	S - 21 billion	MoE	OpenAI	Open-weight
qwen3-30b-a3b	967	83 mWh	S - 30 billion	MoE	Alibaba	Open-weight
phi-3.5-mini-instruct	800	83 mWh	XS - 3.8 billion	Dense	Microsoft	Open-weight
gemma-4-26b-a4b-it	1114	84 mWh	S - 26 billion	MoE	Google	Open-weight
gemma-3-4b	1030	84 mWh	XS - 4 billion	Dense	Google	Open-weight
gemma-3n-e4b-it	1020	84 mWh	XS - 8 billion	Matformer	Google	Open-weight
qwen2.5-7b-instruct	873	88 mWh	XS - 7 billion	Dense	Alibaba	Open-weight
chocolatine-2-14b-instruct-v2.0.3-q8	717	88 mWh	XS - 14 billion	Dense	jpacifico	Open-weight
qwen2-7b-instruct	673	88 mWh	XS - 7 billion	Dense	Alibaba	Open-weight
granite-4.1-8b	980	89 mWh	XS - 8 billion	Dense	IBM	Open-weight
aya-expanse-8b	929	89 mWh	XS - 8 billion	Dense	Cohere	Open-weight
llama-3.1-8b	872	89 mWh	XS - 8 billion	Dense	Meta	Open-weight
gemma-2-9b-it	905	90 mWh	XS - 9 billion	Dense	Google	Open-weight
Yi-1.5-9B-Chat	709	90 mWh	XS - 9 billion	Dense	01-ai	Open-weight
gemma-3-12b	1063	94 mWh	XS - 12 billion	Dense	Google	Open-weight
mistral-nemo-2407	773	94 mWh	XS - 12 billion	Dense	Mistral AI	Open-weight
phi-4	921	96 mWh	XS - 14 billion	Dense	Microsoft	Open-weight
chocolatine-14b-instruct-dpo-v1.2-q4	695	96 mWh	XS - 14 billion	Dense	jpacifico	Open-weight
EuroLLM-22B-Instruct-2512	994	106 mWh	S - 22 billion	Dense	EuroLLM	Open-weight
mistral-small-2506	1059	109 mWh	S - 24 billion	Dense	Mistral AI	Open-weight
magistral-small-2506	1040	109 mWh	S - 24 billion	Dense	Mistral AI	Open-weight
mistral-small-3.1-24b	983	109 mWh	S - 24 billion	Dense	Mistral AI	Open-weight
mistral-small-24b-instruct-2501	955	109 mWh	S - 24 billion	Dense	Mistral AI	Open-weight
gemma-3-27b	1091	112 mWh	S - 27 billion	Dense	Google	Open-weight
gemma-2-27b-it-q8	957	112 mWh	S - 27 billion	Dense	Google	Open-weight
gemma-4-31b-it	1107	117 mWh	S - 31 billion	Dense	Google	Open-weight
qwen3-32b	1030	118 mWh	S - 32 billion	Dense	Alibaba	Open-weight
aya-expanse-32b	960	118 mWh	S - 32 billion	Dense	Cohere	Open-weight
qwq-32b	905	118 mWh	S - 32 billion	Dense	Alibaba	Open-weight
qwen2.5-32b-instruct	904	118 mWh	S - 32 billion	Dense	Alibaba	Open-weight
olmo-3-32b-think	862	118 mWh	S - 32 billion	Dense	Ai2	Open source
qwen3.5-35b-a3b	1002	166 mWh	S - 35 billion	MoE	Alibaba	Open-weight
mixtral-8x7b-instruct-v0.1	824	193 mWh	S - 56 billion	MoE	Mistral AI	Open-weight
qwen3-coder-next	1106	332 mWh	M - 80 billion	MoE	Alibaba	Open-weight
gpt-oss-120b	1019	342 mWh	L - 117 billion	MoE	OpenAI	Open-weight
mistral-small-2603	1104	347 mWh	L - 119 billion	MoE	Mistral AI	Open-weight
nemotron-3-super-120b-a12b	1004	376 mWh	L - 120 billion	MoE	Nvidia	Open-weight
llama-4-scout	992	400 mWh	L - 109 billion	MoE	Meta	Open-weight
llama-3.1-nemotron-70b-instruct	1036	658 mWh	M - 70 billion	Dense	Nvidia	Open-weight
Apertus-70B-Instruct-2509	962	658 mWh	M - 70 billion	Dense	Swiss AI	Open source
llama-3.3-70b	959	658 mWh	M - 70 billion	Dense	Meta	Open-weight
llama-3.1-70b	929	658 mWh	M - 70 billion	Dense	Meta	Open-weight
hermes-4-70b	899	658 mWh	M - 70 billion	Dense	Nous	Open-weight
deepseek-r1-distill-llama-70b	898	658 mWh	M - 70 billion	Dense	DeepSeek	Open-weight
minimax-m2	1016	733 mWh	L - 230 billion	MoE	MiniMax	Open-weight
minimax-m2.5	1013	733 mWh	L - 229 billion	MoE	MiniMax	Open-weight
minimax-m2.7	1001	733 mWh	L - 230 billion	MoE	MiniMax	Open-weight
command-a	1053	857 mWh	L - 111 billion	Dense	Cohere	Open-weight
mixtral-8x22b-instruct-v0.1	758	1063 mWh	L - 176 billion	MoE	Mistral AI	Open-weight
trinity-large-thinking	1109	1524 mWh	L - 398 billion	MoE	Arcee	Open-weight
deepseek-v4-flash	1058	1524 mWh	L - 284 billion	MoE	DeepSeek	Open-weight
trinity-large-preview	970	1524 mWh	L - 398 billion	MoE	Arcee	Open-weight
qwen3.5-397b-a17b	1035	1601 mWh	L - 397 billion	MoE	Alibaba	Open-weight
llama-maverick	1008	1601 mWh	XL - 400 billion	MoE	Meta	Open-weight
glm-4.5	1119	1892 mWh	L - 355 billion	MoE	Zhipu	Open-weight
glm-4.6	1090	1892 mWh	L - 357 billion	MoE	Zhipu	Open-weight
glm-4.7	1005	1892 mWh	L - 357 billion	MoE	Zhipu	Open-weight
Qwen3-Coder-480B-A35B-Instruct	1055	1951 mWh	XL - 480 billion	MoE	Alibaba	Open-weight
kimi-k2.6	1078	3785 mWh	XL - 1000 billion	MoE	Moonshot AI	Open-weight
kimi-k2.5	1065	3785 mWh	XL - 1000 billion	MoE	Moonshot AI	Open-weight
kimi-k2-thinking	1059	3785 mWh	XL - 1000 billion	MoE	Moonshot AI	Open-weight
kimi-k2	1052	3785 mWh	XL - 1000 billion	MoE	Moonshot AI	Open-weight
deepseek-v3-0324	1105	3979 mWh	XL - 685 billion	MoE	DeepSeek	Open-weight
deepseek-v3-chat	1094	3979 mWh	XL - 671 billion	MoE	DeepSeek	Open-weight
deepseek-chat-v3.1	1093	3979 mWh	XL - 685 billion	MoE	DeepSeek	Open-weight
deepseek-r1-0528	1068	3979 mWh	XL - 685 billion	MoE	DeepSeek	Open-weight
DeepSeek-V3.2	1058	3979 mWh	XL - 685 billion	MoE	DeepSeek	Open-weight
deepseek-r1	1023	3979 mWh	XL - 671 billion	MoE	DeepSeek	Open-weight
glm-5.1	1022	4095 mWh	XL - 744 billion	MoE	Zhipu	Open-weight
glm-5	1016	4095 mWh	XL - 744 billion	MoE	Zhipu	Open-weight
mistral-large-2512	1137	4134 mWh	XL - 675 billion	MoE	Mistral AI	Open-weight
deepseek-v4-pro	1058	8890 mWh	XL - 1600 billion	MoE	DeepSeek	Open-weight
llama-3.1-405b	915	9134 mWh	XL - 405 billion	Dense	Meta	Open-weight
hermes-3-llama-3.1-405b	891	9134 mWh	XL - 405 billion	Dense	Nous	Open-weight

How to choose the model classification method?

Since 2024, thousands of users have used the arena to compare the responses of different models, generating hundreds of thousands of votes. Simply counting the number of wins is not enough to establish a ranking. A fair system must be statistically robust, adjust after each matchup, and truly reflect the value of the performances achieved.

It is with this in mind that a ranking based on the Bradley-Terry model was established, developed in collaboration with the French Center of expertise for digital platform regulation (PEReN) teams, based on all the votes and reactions collected on the platform. To learn more, see our methodological notebook.

Two ways to classify models

Ranking by win rate

Definition An empirical ranking system for models based on the percentage of duels won by a model against all other models.

Main problems

Game Count Bias A model that has won three out of three duels has a 100% win rate, but this score is not very meaningful because it is based on very little data.
No consideration of duel difficulty: beating a “beginner” or an “expert” model counts the same. Win rates are unfair since they do not take match difficulty into account.
Stagnation: In the long run, many good models end up around 50% win rate because they are facing models of their own skill level, which makes the rankings less discriminating.

Bradley-Terry (BT) leaderboard

Definition : Ranking system where the gain or loss of points depends on the result (victory/defeat/draw and the estimated level of the opponent: if a weaker model beats a stronger model, its progression in the ranking is greater.

Benefits

Probabilistic model : we can estimate the probable outcome of any matchup, even between models that have never been directly competed.
Taking match difficulty into account :The scores estimated from the Bradley Terry model take into account the level of the opponents encountered, allowing for a fair comparison between models.
Better Uncertainty Management :The confidence interval integrates the entire network of comparisons. This allows for a more accurate estimation of uncertainty, especially for models with few direct confrontations but many common opponents.

Impact of methodological choice on model ranking

Top 10 models in the ranking based on "empirical" win rates

Download data

Based solely on the average win rate , an overall ranking can be obtained, but this calculation assumes that each model has played against all others.

This method is not ideal because it requires data from all combinations of models and as soon as the number of models increases, it quickly becomes expensive and cumbersome to maintain.

Top 10 models in the ranking based on estimated win rate with the Bradley-Terry model