AI Can Unmask Anonymous Users for $1 a Profile

Here is a question worth sitting with: when you post online under a pseudonym, who exactly do you think you are hiding from? A government agency with subpoena powers and a dedicated team of analysts? A well-resourced private investigator? The answer, according to a striking new piece of research, is now: anyone with a credit card and access to a standard AI tool.

A pre-print paper titled Large-scale online deanonymization with LLMs, published on arXiv and reported by The Register, presents findings that should unsettle every person who has ever assumed that a Reddit handle or a Hacker News username provides meaningful cover. Researchers led by Simon Lermen, from MATS Research, alongside collaborators from ETH Zurich and Anthropic, demonstrated that large language models can deanonymise internet users by linking pseudonymous posts to real profiles across Hacker News, Reddit, LinkedIn, and interview transcripts.

The fundamental question is not merely technical. It is about the viability of the privacy assumptions that millions of people make every day, and whether governments, including Australia's, are moving fast enough to respond.

What the research actually found

In their core experiment, the researchers collected 338 Hacker News users whose bios linked to a LinkedIn profile, establishing ground-truth identities so the model's predictions could be verified. They built structured profiles from each user's comments and posts, anonymised a search prompt, and passed it to an AI agent. That agent correctly identified 226 of the 338 targets, a success rate of 67 per cent at 90 per cent precision, with only 25 errant identifications and 86 cases where the model declined to offer a prediction.

Across all experimental settings, LLM-based methods achieved up to 68 per cent recall at 90 per cent precision, compared to near zero for the best non-LLM method. That comparison deserves emphasis. Classical deanonymisation techniques, the kind that have existed for years, were essentially useless in these tests. The AI-powered approach was not a marginal improvement; it was a categorical one.

Person wearing privacy glasses symbolising online anonymity concerns — The assumption that pseudonymous posts provide real-world anonymity is increasingly difficult to sustain.

Cost is the other dimension that makes this research genuinely alarming. The entire experiment cost approximately $2,000, with individual profiles costing between $1 and $4 to identify. The authors note that this price is falling, not rising. As Simon Lermen put it: "Ask yourself: could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same, and the cost of doing so is only going down."

Why this breaks a foundational privacy assumption

Privacy researchers have long understood, in theory, that anonymity is fragile. Much of the academic work on online privacy over the past 25 years builds on Latanya Sweeney's 2002 research on k-Anonymity, including prior work in which she demonstrated it is possible to identify 87 per cent of the US population using just three anonymous data points: a five-digit ZIP code, gender, and date of birth. The vulnerability has been known. What has changed is the cost and effort required to exploit it.

The broader implication is that "practical obscurity", the idea that scattered, pseudonymous posts are safe because linking them is too labour-intensive, may no longer hold. The researchers argue that LLMs fundamentally change this calculus, enabling fully automated deanonymisation attacks that operate on unstructured text at scale. Where previous approaches required predefined feature schemas, careful data alignment, and manual verification, LLMs can extract identity-relevant signals from arbitrary prose, efficiently search over millions of candidate profiles, and reason about whether two accounts belong to the same person.

There is also a troubling structural feature to the attack that complicates regulatory responses. The pipeline is composed of individually benign steps: summarising text, generating embeddings, ranking candidates, and reasoning over matches. No single component appears inherently malicious, making it difficult to detect or restrict through conventional safeguards.

Who could use this, and how

The researchers are candid about the potential misuse cases. Governments could use the technique to target journalists or activists. Corporations could mine public forums to construct hyper-targeted advertising profiles. And criminals could build detailed personal dossiers to make social engineering attacks more convincing. Persistent usernames, writing style, niche interests, and cross-platform references can collectively act as a fingerprint.

The study also finds that increasing model reasoning effort improves deanonymisation performance, implying that as frontier models become more capable, the attack may become even more effective by default. This is the trajectory that should concentrate minds in Canberra.

The counter-argument deserves serious consideration

Some researchers and civil libertarians would argue that this kind of work, by making the threat visible, actually strengthens the case for privacy regulation. They would be right. There is a legitimate school of thought that transparency research of this kind is exactly what is needed to jolt legislators out of complacency, and it is hard to dismiss that argument. Exposing a vulnerability in a controlled research setting is categorically different from exploiting it, and the authors took care to construct their study using only publicly verifiable ground-truth identities to avoid actually deanonymising anyone.

Others would say that people who choose to post publicly online have, in some sense, already accepted reduced privacy expectations. That position has surface plausibility but does not survive scrutiny. There is a meaningful difference between posting a comment under a pseudonym in a specialist forum and having your real-world identity, employment history, and posting behaviour assembled into a profile by an automated system you never consented to. One is public participation; the other is surveillance.

Digital data privacy concept image — The line between publicly available information and invasive profiling is being redrawn by AI capabilities.

Where Australia's law stands, and where it falls short

Australia is, to its credit, in the middle of the most substantial privacy law reform since the Privacy Act 1988 was first enacted. A new statutory right to sue for serious privacy breaches commenced on 10 June 2025, along with substantially stronger requirements for businesses handling personal information. New transparency obligations around automated decision-making are due to take effect in December 2026.

But the gap between those reforms and the threat described in this research is considerable. Australia does not have dedicated or overarching AI legislation. Instead, its regulatory approach relies on a combination of voluntary frameworks and existing non-AI specific laws. In December 2025, the National AI Plan confirmed that, for now, Australia will rely on existing laws and sector regulators, supported by voluntary guidance and a new AI Safety Institute, rather than introducing a standalone AI Act or immediate mandatory guardrails.

That cautious approach has reasonable justifications. Regulatory overreach carries its own costs, and locking in prescriptive rules for a technology moving this fast risks producing laws that are obsolete before they are enforced. The Tech Council of Australia has argued that overly tight automated decision-making regulation could stifle innovation and burden businesses with unnecessary compliance, calling instead for a risk-based approach that focuses on high-impact decisions. That is not a position to be dismissed. The balance between protecting privacy and preserving the conditions for innovation is genuinely difficult.

But voluntary frameworks were not designed to address threats that operate at the cost of a cup of coffee per victim. The Office of the Australian Information Commissioner has signalled a more enforcement-focused posture, but enforcement tools are only as useful as the laws they are built on. If the underlying legal framework was not designed with automated, large-scale deanonymisation in mind, it may simply not fit.

What comes next

A 2025 study by the University of Melbourne and KPMG found that only 30 per cent of Australians believe the benefits of AI outweigh its risks, with just 36 per cent of citizens expressing broader trust in AI systems. Approximately 78 per cent of respondents expressed concern about negative outcomes from AI, and only 30 per cent believe current laws and safeguards are adequate. Those numbers describe a public that is already sceptical, and research like this is unlikely to shift them toward confidence.

Strip away the technical detail and what remains is a straightforward accountability question. The tools to identify people from their online behaviour, at scale, affordably, and without their knowledge, now exist. They are not experimental curiosities; they are capabilities available to any sufficiently motivated actor. Australian law, as it currently stands, was not built for this moment.

That does not mean the answer is to rush a poorly designed regulatory framework into existence. Reasonable people can disagree about the precise shape of a legislative response. But the argument that existing laws and voluntary standards are sufficient should be treated with considerably more scepticism after this week's findings. Privacy is not only a personal value; it is the infrastructure on which press freedom, political dissent, and individual autonomy all depend. The case for taking it seriously, with enforceable rules rather than guidance documents, grows stronger every time a new study confirms how easy it has become to strip it away.