Experts say the study reinforces a classic principle of computing — “garbage in, garbage out” — as poor data continues to poison modern generative AI systems.
By PC Bureau
November 1, 2025: Artificial intelligence (AI) chatbots risk becoming less accurate and more irrational when trained on massive amounts of low-quality online content, especially social-media posts, according to a new study. The research, posted as a preprint on arXiv on October 15 and reported by Nature, warns that poorly curated data can lead to what scientists are calling “AI brain rot.”
Researchers led by Zhangyang Wang, an associate professor of electrical and computer engineering at the University of Texas at Austin, found that large language models (LLMs) trained on junk data — such as brief, sensational, or superficial posts — show sharp declines in reasoning, factual accuracy, and even ethical judgment.
Garbage In, Garbage Out
In data science, good data typically means text that is grammatically correct and understandable. But Wang says that standard definitions overlook deeper issues of content quality. “A sentence can be grammatically fine yet intellectually empty,” he explains.
To test how low-quality data influences AI, Wang’s team trained open-source models — Meta’s Llama 3 and Alibaba’s Qwen — on a million public posts scraped from the social-media platform X (formerly Twitter). They compared performance across datasets that varied in the proportion of “junk” content.
The results were stark. Models fed with higher proportions of low-quality data frequently skipped logical steps or abandoned reasoning altogether. When asked multiple-choice questions, these models chose wrong answers more often and retrieved less accurate information from long inputs.
“The effect was proportional,” says Wang. “The more junk we added, the more the reasoning degraded.”
How Junk Data Changes AI Personality
The study also examined how low-quality content affected a model’s personality traits. Using standard psychology questionnaires, researchers found that before exposure to junk data, Llama displayed traits such as openness, conscientiousness, extroversion, and mild narcissism.
But as the amount of junk data increased, these positive traits diminished, and darker traits — including psychopathy — began to emerge.
According to Wang, this change suggests that exposure to sensationalist or hostile online material can shape not only how an AI reasons but also how it responds emotionally or ethically to users.
Patching the Problem Isn’t Easy
The researchers tried to fix the issue by improving prompt instructions and retraining the models with cleaner data. While these efforts led to partial improvements, they did not fully restore reasoning ability.
Even when encouraged to reflect on its own mistakes — a technique often used to improve reasoning — the model continued to skip crucial analytical steps.
“This indicates that once an AI system has absorbed low-quality data, it may be very hard to reverse the effects completely,” the study notes.
READ: Muivah’s Visit Rekindles Ethinic Tensions in Manipur Hills, COTU Takes on NPO
READ: Nine Devotees Killed in Stampede at Andhra’s Venkateswara Swamy Temple
Experts Call for Rigorous Data Curation
AI experts say the findings reaffirm a long-standing principle of computing: garbage in, garbage out.
“Even before large language models existed, we knew that poor input leads to poor output,” says Dr. Mehwish Nasim, an AI researcher at the University of Western Australia in Perth. “This study gives empirical weight to that idea for the modern AI era.”
Dr. Stan Karanasios, who studies AI and social media at the University of Queensland, adds that the key to avoiding “AI brain rot” lies in careful data curation. “The most important safeguard is to ensure that training data is filtered, curated, and stripped of low-quality or sensationalist content,” he says.
What’s Next for AI Training?
The study has not yet been peer-reviewed, but it has raised important questions about how generative AI models are trained. Because many commercial systems — such as OpenAI’s ChatGPT — are proprietary, independent researchers cannot easily analyze or retrain them.
Dr. Nasim says future research should explore whether the damage caused by junk data can be undone by retraining models on sufficiently high-quality material. “It’s possible that a steady diet of good data could ‘rehabilitate’ a model, but we don’t know yet,” she says.
A Growing Concern
The study comes amid growing concern over how companies use online data. Just last month, LinkedIn announced plans to use user-generated content from the UK, parts of Europe, and Switzerland to train generative AI models starting November 3.
As AI systems become central to business, education, and governance, the quality of the data they are fed could determine how truthful — or toxic — they become.
“Data is the mind of the machine,” says Wang. “If we fill it with noise, we should not be surprised when it starts to think like the crowd.”








