A Troubling Discovery
Independent researchers have just shed light on a reality that few anticipated: the latest versions of ChatGPT integrate information from Grokipedia, the AI-powered encyclopedia from xAI, Elon Musk's company. This revelation, confirmed by reproducible tests, upends our understanding of the informational ecosystem of language models.
The phenomenon was identified during systematic analyses of ChatGPT responses on current events. Certain phrasings, certain factual biases, and especially certain errors specific to Grokipedia appear in OpenAI's outputs. The statistical coincidence is too strong to ignore.
Implications for Users
This situation poses a major trust problem. When a user queries ChatGPT, they expect to receive neutral information, or at minimum, to understand where the data comes from. Using Grokipedia as a source introduces several risks.
First, Grokipedia is not Wikipedia. Its editorial methodology remains opaque, and its association with xAI and Twitter/X suggests potential biases aligned with its founder's vision. Political, economic, and technological topics could be presented from a particular angle.
Second, this creates an echo chamber effect between AIs. If ChatGPT uses Grok as a source, and Grok perhaps trains on ChatGPT-generated content, we enter a feedback loop where errors propagate and amplify.
The Transparency Question
OpenAI built its reputation on the idea of accessible and relatively transparent AI. But training sources remain a jealously guarded commercial secret. This opacity becomes problematic when it masks implicit editorial choices.
Users deserve to know whether the information they receive comes from reliable, verified sources, or from platforms whose editorial standards are questionable. Without this transparency, AI becomes a black box that potentially amplifies biased narratives while maintaining an appearance of neutrality.
A Dangerous Precedent for the Industry
This discovery should serve as a warning for the entire AI industry. The race to accumulate data pushes companies to use all available sources, without necessarily evaluating their quality or neutrality.
The problem worsens with the scarcity of quality data. The web is progressively filling with AI-generated content, creating a spiral where models train on their own outputs, gradually degrading informational quality.
What Can Users Do?
Facing this reality, critical thinking becomes more crucial than ever. Always verify important information with primary sources. Never consider an AI response as established truth, but rather as a starting point for deeper research.
Diversifying tools is also wise. Use multiple AI models, compare their responses, identify divergences. These divergences are often the most reliable indicators of underlying biases or factual uncertainties.
Toward Source Regulation?
This affair revives the debate on the need to regulate AI model data sources. Some propose mandating transparency on training datasets. Others suggest creating quality labels for information sources used by AIs.
The European Union, with its AI Act, is beginning to lay the groundwork for such requirements. But implementation remains complex in a technological environment that evolves faster than legislators can follow.
In the meantime, this discovery reminds us of a fundamental truth: AIs are not neutral oracles, but systems built by humans, with choices, biases, and interests. Our responsibility is to never forget it.
