AI Internet Collapse: Greene Warns of Junk Data Loop

Strip away the marketing rhetoric and what remains is a crisis of institutional failure dressed up as technological progress. Brendan Greene, the developer behind PUBG who now runs his independent studio in Amsterdam, has just articulated what many technologists have been whispering: the internet is in the grip of a self-reinforcing collapse.

The mechanism is brutally simple. Large language models cannot be trusted because they hallucinate, and an estimated 20% of online interactions are now artificial, with the amount of news generated by LLMs staggeringly high. This matters less as an inconvenience and more as a structural vulnerability. Here is where it becomes dangerous: LLMs scan this junk, then that becomes truth, in what amounts to a race to the middle of garbage.

This is not hyperbole. This is a documented phenomenon in machine learning research. When AI models train on data produced by other AI systems rather than humans, the quality deteriorates across generations. As subsequent models produce output that is then used as training data for future models, the effect gets worse, likened by researchers to taking photos of photos.

Consider the scale of the problem. Mentions of "AI slop" increased ninefold in 2025 compared to 2024, and the term was named Word of the Year by both Merriam-Webster and the Australian National Dictionary, with research estimating that 21 to 33% of YouTube's feed may consist of such content. The numbers are difficult to process because they challenge our basic assumptions about information quality.

The counterargument deserves serious consideration. Some AI companies argue they do not rely solely on public internet data. Human annotation remains critical to model quality, and some observers believe the glut of AI-generated content could add premium value to human-crafted material. There is logic here. Scarcity creates value. But this argument assumes markets will function to price authenticity fairly. The evidence suggests otherwise.

The fundamental incentive structure is broken. Platforms do not penalise synthetic content; they benefit from it. Platforms gain engagement from AI slop, with Mark Zuckerberg stating Meta will add "a whole new category of content which is AI generated or AI summarised content" with no mention of better moderation. When the architecture rewards engagement over accuracy, the system optimises for engagement.

Greene is candid about where legitimate uses exist. In tightly defined datasets with domain-specific applications, LLMs perform well and do not hallucinate as badly when deterministic constraints are applied. This is crucial: the technology itself is not worthless. The failure is regulatory and structural, not fundamental to AI as a concept.

But the trajectory is clear. The bigger problem in 2026 will be attention spread across an overwhelming surplus of AI-generated media, with algorithms optimised for attention above all else, creating an ecosystem where truth must fight harder than ever against noise. The internet we built to democratise information is becoming a hall of mirrors.

What would actually work? Metadata standards that tag content origin. Human curation requirements for training datasets. Regulation that requires disclosure when content is synthetic. Efforts are underway to develop metadata standards that tag digital content with its origin, whether created by a person or generated by AI, to help future models differentiate between synthetic and authentic content.

None of these solutions is technically difficult. All are politically inconvenient for platforms whose business models depend on content velocity.

The hard truth is this: we cannot regulate our way out of a problem we created through the pursuit of profit margins. But we can at least acknowledge what Greene is telling us. The internet's quality crisis is not a market failure we can leave to innovation. It is a choice. Companies are choosing volume over truth. Until the cost of that choice exceeds the benefit, nothing changes.