AI Can Unmask Anonymous Users for Under $2 Each

From Singapore: The assumption that a username protects your identity online has quietly collapsed. Researchers from the Swiss Federal Institute of Technology ETH Zurich, the Machine Learning Alignment Theory Scholars programme, and AI company Anthropic have published a paper showing that large language models can strip anonymity from pseudonymous online accounts at scale, for as little as US$1.41 per person.

The implications are direct and serious. Journalists working under pseudonyms, dissidents in authoritarian countries, activists coordinating on public forums, and employees using anonymous accounts to raise workplace concerns could all be identified using commercially available AI tools and a modest budget. This is not a theoretical risk requiring state-level resources. It is now, according to the researchers, within reach of any moderately resourced actor.

Artificial intelligence concept illustration — Large language models can now process unstructured online text to identify pseudonymous users at internet scale.

How the Pipeline Works

The researchers built a four-stage attack framework they named ESRC: Extract, Search, Reason, and Calibrate. A large language model first pulls identity-relevant signals from unstructured posts, things like writing style, incidental location references, job titles, hobbies, and conference attendance. Semantic search then narrows a candidate pool to likely matches. A second, more capable model reasons over the top candidates to verify the best match. A final calibration stage controls false positives, letting an attacker trade off accuracy against the number of users successfully identified.

The pipeline requires no structured data, no predefined features, and no manual effort from trained investigators. It runs entirely on the kind of unstructured text that fills public forums every day. In testing, the system matched Hacker News accounts to LinkedIn profiles across a pool of 89,000 users with 45.1 percent recall at a 99 percent precision threshold. Previous automated methods achieved just 0.1 percent recall at the same precision level. In a separate test linking pseudonymous Reddit accounts across time, the pipeline identified more than a third of all users at 99 percent precision.

The models used in the pipeline included Grok 4.1 Fast from xAI, GPT-5.2 from OpenAI, and Gemini 3 Flash and Gemini 3 Pro from Google. No Claude model was used, despite Anthropic researcher Nicholas Carlini serving as an adviser on the paper. The full preprint is available on arXiv and is awaiting peer review.

Guardrails Are Not Holding

The researchers tested the safety guardrails built into commercial AI systems and found them unreliable. In some cases the models declined to assist, but small changes to the prompts bypassed those refusals every time. The pipeline's step-by-step structure, summarising profiles, computing embeddings, ranking candidates, also makes it resemble ordinary, benign usage, which undermines automated misuse detection.

Open-source AI models extend the threat further. Because safety guardrails can be removed from open-source deployments entirely, and because there is no usage monitoring on self-hosted models, the commercial API cost floor of US$1.41 to US$5.64 per target may not even apply. The barrier is lower still for those willing to run their own infrastructure.

Where the Burden Falls

The researchers are explicit about where they think responsibility lies. Rate limits on API data access, automated scraping detection, and restrictions on bulk data exports are identified as the most practical near-term responses, and all of them sit with platforms rather than AI providers. This is a meaningful distinction. It shifts accountability toward the companies hosting public forums and social data, rather than the AI developers whose tools are being used downstream.

That framing will not satisfy everyone. Privacy advocates have long argued that the architecture of surveillance capitalism, built on harvesting and exposing user data, creates the conditions these techniques exploit. The researchers' tool did not create the public data; it simply made the connection between data points faster and cheaper. From that perspective, the platform-centric remedy addresses symptoms rather than causes.

There is also a legitimate question about whether voluntary platform measures will be enough. The researchers themselves stopped short of releasing their pipeline code or processed datasets, citing the risk of lowering the barrier for malicious actors. That restraint is commendable, but it does not prevent independent replication. The methodology is now public.

What This Means in Practice

For Australian readers, the risks are not abstract. Whistleblowers, union organisers, domestic violence survivors using anonymous accounts, and anyone relying on pseudonymity for personal safety all face a changed threat environment. The Office of the Australian Information Commissioner has jurisdiction over how Australian platforms handle personal data, but the legal frameworks governing AI-assisted deanonymisation remain underdeveloped.

The researchers project roughly 35 percent recall at 90 percent precision against a candidate pool of one million users, and they expect future models to be both more accurate and cheaper. The technology is not standing still. The policy response, in Australia and globally, will need to move faster than it typically does to keep pace with a capability that is already, by any reasonable measure, operational.

Reasonable people can disagree about how to weight individual privacy against the genuine benefits that open, searchable public data provides to researchers, journalists, and civil society. But that debate is harder to have honestly when one side of the ledger, the cost of anonymity being stripped away, has just dropped to the price of a cup of coffee.