A Discovery That Raises Eyebrows
Researchers and savvy users have recently made a troubling discovery: the latest versions of ChatGPT appear to use Grokipedia, the online encyclopedia launched by Elon Musk via xAI, as one of their information sources. This revelation, confirmed by several independent tests, raises fundamental questions about the information supply chain of large language models.
Grokipedia: A Controversial Encyclopedia
Launched as an alternative to Wikipedia, Grokipedia distinguishes itself through its less strict approach to fact-checking and its openness to "alternative" perspectives. While this philosophy may seem attractive to some, it has also opened the door to information of variable quality, or even outright misinformation.
The fact that OpenAI, a company that prides itself on developing safe and reliable AI, uses such a source in its training or real-time search process poses an obvious credibility problem. How can we trust an assistant that relies on sources whose editorial rigor is contested?
The Data Provenance Problem
This affair highlights a crucial and often underestimated issue in AI development: the quality and provenance of training data. Large language models are literally shaped by the texts they ingest. If these texts contain errors, biases, or false information, the model will inevitably reproduce them.
OpenAI has never been fully transparent about the sources used to train its models. This opacity, justified by commercial and security reasons, makes any independent assessment of ChatGPT's intrinsic reliability difficult.
Users as Unwitting Beta Testers
This situation places users in an uncomfortable position. Every ChatGPT response becomes suspect: is it based on reliable sources, on Grokipedia, or a mixture of both? Without a tool to trace the provenance of information, it's impossible to know.
Some experts now recommend systematically verifying ChatGPT's claims, particularly on sensitive or current topics. Sound advice, but one that questions the very utility of an assistant meant to save us time.
Toward AI Source Certification?
This controversy could accelerate discussions around AI model regulation and certification. Proposals are emerging to require companies to disclose their training sources, or at least guarantee a certain level of information quality.
The European Union, with its AI Act, has already laid the first stones of such a regulatory framework. But the road is still long before users can have informed confidence in the information provided by their AI assistants.
The Irony of the Situation
There's a biting irony in this situation: xAI, Elon Musk's company, is a direct competitor of OpenAI. That OpenAI's models rely on content produced by its rival's ecosystem illustrates how the web has become a space of complex interdependence, where even the fiercest competitors end up feeding off each other.
