Claude 4, GLM 4.5, GPT OSS and other new models join the arena...to be tested!
187,626 votes
Goal: 200,000 Legend Discuss, vote, and help us reach this goal!
Your votes matter: they feed the compar:IA dataset, which is freely available to help refine future models in less-resourced languages.
This digital commons contributes to better respect for linguistic and cultural diversity in future language models.

Don't trust the answers of a single AI

Have a blind discussion with two AIs and evaluate their answers

How it works1. I chat with two hidden AIs: Chat for as long as you like. 2. I give my preference: By doing so, you'll help improve the AI models. 3. The model identities are revealed: Learn more about them and their characteristics.

I chat with two hidden AIs Chat as long as you like

The model identities are revealed! Learn more about AI models and their characteristics

I give my preference By doing so, you'll help improve the AI models

What is compar:AI for?

compar:IA is a free tool that helps raise awareness among citizens about generative AI and its challenges.

Compare
Compare the responses of different AI models

Discuss and develop your critical thinking by giving your preference

Test
Test the latest AI in the ecosystem in one place

Test different models: open, proprietary, small, large...

Measure
Measure the environmental footprint of questions asked to AI

Discover the environmental impact of your conversations with each model

Why is your vote important?

The tool is also useful to AI experts, developers and for educational purposes.

Your preferences
Your preferences

After discussing with the AIs, you are invited to indicate your preference for a model on given criteria, such as the relevance or usefulness of the answers.

Datasets by language
Datasets by language

All questions and votes are compiled into datasets and published openly after anonymization.

Models fine-tuned for specific languages
Models fine-tuned for specific languages

Companies and academia can use the datasets to train new models that are more respectful of linguistic and cultural diversity.

Specific use cases of compar:IA

The tool is also useful to AI experts, developers and for educational purposes.

Reuse the data

Developers, researchers, model publishers - access compar:IA’s datasets to enhance models for low-resource languages

Explore the models

Find all model specifications and terms of use in one place

Train and raise awareness

Use the chatbot arena as an educational tool to discuss AI with your audience

Who are we?

The chatbot arena is led within the French Ministry of Culture by a multidisciplinary team - AI experts, developers, deployment specialists, and designers - with a mission to make conversational AI more transparent and accessible to everyone.

Ministère de la CultureAtelier numérique
Who initiated the project?

The chatbot arena was designed and developed as part of a government startup led by the French Ministry of Culture, integrated into the Beta.gouv.fr program by the Interministerial Digital Directorate (DINUM). This initiative supports French public administrations in building useful, simple, and user-friendly digital services.

beta.gouv.fr DINUM

Your frequently asked questions

Have you asked the question: “Explain to me the latest trendy cheesecake recipe and cite your sources” and been disappointed with the answers? That's normal...

“Raw” conversational AI models cannot answer questions about the most recent news. They are trained on static datasets and cannot interact with the web or open links. They do not have the ability to update themselves in real time with events unfolding in the world. The information the model has access to is limited to the date of its last training.

Therefore, if you ask a question about a recent news event, the model will rely on outdated information, risking generating inaccurate answers.

In the case of Perplexity, Copilot, or ChatGPT, the so-called “raw” conversational AI models are combined with other system blocks that allow them to connect to the internet to access information in real time. These are called “conversational agents.”

We choose models based on their popularity, diversity, and relevance to users. We pay particular attention to making open weights and different size models available.

The uniqueness of the data collected on the compar:IA platform is that it is in lower ressourced languages and corresponds to real-life user tasks. This data reflects human preferences in specific linguistic and cultural contexts. This data allows us to adjust the models to make them more relevant, accurate, and adapted to user needs, while attempting to address any biases or gaps.

compar:AI uses the methodology developed by Ecologits (GenAI Impact) to provide an estimation for energy consumed that allows users to compare the environmental impact of different AI models for the same query. This transparency is essential to encourage the development and adoption of more eco-responsible AI models.

Ecologits applies the principles of Life Cycle Assessment (LCA) in accordance with ISO 14044, focusing for the moment on the impact of inference (i.e., the use of models to answer queries) and the manufacturing of graphics cards (resource extraction, manufacturing, and transportation).

The model's power consumption is estimated by taking into account various parameters such as the size of the AI model used, the location of the servers where the models are deployed, and the number of output tokens. The calculation of the global warming potential indicator expressed in CO2 equivalent is derived from the model's power consumption measurement.

It is important to note that methodologies for assessing the environmental impact of AI are still under development.

Yes, the internationalization of compar:AI is underway. We are starting with an expansion to three pilot countries: Lithuania, Sweden, and Denmark. This first phase will allow us to test the approach and adapt the interface to different European linguistic and cultural contexts. Eventually, the circle may expand to more European languages based on feedback from these pilot countries. The objective is to gradually build a true European digital common for human evaluation of conversational AI, with collaborative governance that remains to be defined between the different participating countries.