AI Chatbots and Privacy: What You Need to Know

The casual conversation you just had with ChatGPT, asking for dinner recipes or help with a work email, may not remain private. All six major AI companies employ users' chat data by default to train their models, and some developers keep this information in their systems indefinitely. What feels like a confidential exchange between you and a machine is, in fact, data being harvested for commercial purposes.

The scale of this practice is striking. Hundreds of millions of people are interacting with AI chatbots, which are collecting personal data for training, despite almost no research conducted to examine the privacy practices for these emerging tools. A Stanford study comparing six leading developers found that the gap between company promises and actual practice is substantial.

How Your Data Gets Used

Anthropic recently changed its terms of service so that conversations with its AI chatbot, Claude, will be used for training its large language model by default, unless you opt out. This pattern is not unique. The mechanisms are particularly concerning because they operate at scale. For multiproduct companies such as Google, Meta, Microsoft, and Amazon, user interactions routinely get merged with information gleaned from other products consumers use on those platforms, including search queries, sales and purchases, and social media engagement.

Beyond simple data collection, the inferences drawn from your conversations pose real risks. Consider a mundane example: if you ask for heart-friendly recipes, the system may infer you are health-vulnerable. The chatbot can draw inferences from that input, and the algorithm may decide you fit a classification as a health-vulnerable individual. This determination drips its way through the developer's ecosystem, and you start seeing ads for medications, with the information potentially ending up in the hands of an insurance company.

The Consent Problem

AI developers' privacy documentation is often unclear, making it difficult for users to understand their data rights. A substantial audit found a troubling gap between what companies claim and what users understand. A 2024 EU audit found that 63 percent of ChatGPT user data contained personally identifiable information, with only 22 percent of users aware of opt-out settings.

This represents a fundamental problem in how consent operates in the modern digital economy. Typically written in convoluted legal language, privacy documents are difficult for consumers to read and understand, yet consumers must agree to them if they want to interact with large language models. The burden falls entirely on the user to navigate dense policy documents and locate obscure settings to prevent their data from being used for training.

The Data Retention Question

Even if you delete a conversation, it may not truly be gone. OpenAI retains data on its servers for 30 days for safety monitoring, even if you delete a chat or use the Temporary chat feature. This creates a window during which your information remains accessible to the company and potentially to staff who review conversations. Chats that include your prompts and responses may be accessible to OpenAI personnel or contractors, which means your disclosures aren't in confidence.

What often goes unmentioned in marketing materials is that indefinite retention conflicts with data minimisation principles. Most users have no idea how long their data is kept or the various ways in which it is used, and due to the lack of consent mechanisms and transparency about data retention and sharing, users' personal and private information may be stored longer than they would like or consider acceptable.

Institutional Accountability and Reform

The regulatory landscape remains fragmented, particularly outside Europe. In the United States, privacy protections for personal data collected by or shared with LLM developers are complicated by a patchwork of state-level laws and a lack of federal regulation. This absence of coherent federal standards has created an environment where companies set their own rules with minimal oversight.

Researchers have proposed concrete reforms. Policymakers and developers should address data privacy challenges posed by LLM-powered chatbots through comprehensive federal privacy regulation, affirmative opt-in for model training, and filtering personal information from chat inputs by default. These are not radical proposals. They represent standard data-protection principles that would place the burden on companies rather than individuals to justify data use.

Practical Protections Exist, But Are Fragmented

If you choose to use chatbots, several practical steps can reduce exposure. Navigate to Settings, then Data Controls, then "Improve Model for Everyone," and toggle the setting to "Off" to exclude your data from future training cycles. For especially sensitive tasks, OpenAI offers temporary chats that are not used for training the model. Enterprise and business subscriptions typically offer stronger protections, including data isolation.

However, these controls place responsibility entirely on the user to remember settings and modify defaults. Nothing on the internet can be 100 percent secure or private. The risk is not merely hypothetical; organisations using general-purpose chatbots for business purposes face significant exposure. Concentric AI found that GenAI tools such as Microsoft Copilot exposed around three million sensitive records per organisation during the first half of 2025.

The Broader Calculus

The strategic calculus here involves several competing considerations. AI developers argue that training data is essential for improving their models and that de-identification reduces privacy risks. But the evidence suggests de-identification is incomplete. Scrubbing personal data from a dataset of this size is a statistical impossibility; when data is fed into a vector database, it doesn't just store the text but stores the relationships between words, and if you paste proprietary code, the model learns the logic of how that code works, and you can't strip the new problem-solving patterns the AI has already learned.

From the perspective of individual liberty and fiscal responsibility, the current arrangement is problematic. Users are not compensated for data use, they lack meaningful control over collection practices, and companies operate under rules of their own making. From the perspective of institutional accountability, the lack of federal privacy standards creates a regulatory vacuum where corporate interests determine the rules.

What is often overlooked in the public discourse is that this arrangement is not inevitable. Jurisdictions such as the European Union have established clearer standards through the General Data Protection Regulation. The United States could establish similar protections. The question is whether policymakers will act before the accumulation of biometric, financial, and health information in AI systems becomes irreversible.

Until such standards are established, users who value privacy should approach casual chatbot conversations with the assumption that nothing is confidential. Your secrets are becoming training data. The choice to prevent that requires navigating settings most users never discover.