Skip to main content

Archived Article — The Daily Perspective is no longer active. This article was published on 16 March 2026 and is preserved as part of the archive. Read the farewell | Browse archive

Technology

The Copyright Reckoning: Reference Giants Challenge OpenAI's Training Methods

Encyclopedia Britannica and Merriam-Webster's landmark lawsuit raises fundamental questions about how AI companies should compensate content creators

The Copyright Reckoning: Reference Giants Challenge OpenAI's Training Methods
Image: The Verge
Key Points 3 min read
  • Encyclopedia Britannica and Merriam-Webster sued OpenAI in Manhattan federal court for using nearly 100,000 articles to train ChatGPT without authorisation
  • The lawsuit alleges ChatGPT generates near-verbatim copies of Britannica content, diverting users and revenue from the publishers' websites
  • OpenAI defends its practices as fair use; courts have yet to settle whether training AI on copyrighted material constitutes infringement
  • The case arrives amid 91 pending copyright lawsuits against AI companies and signals a wider reckoning over intellectual property in the AI age

Encyclopedia Britannica and its Merriam-Webster subsidiary have filed a lawsuit against OpenAI in federal court in Manhattan, accusing the company of using their reference materials without permission to train artificial intelligence systems including ChatGPT. The complaint, filed Friday, marks the latest and perhaps most symbolically significant challenge to OpenAI's training practices. That the guardians of 250 years of accumulated reference material are now in court speaks to a fundamental tension in the AI industry: who owns knowledge, and who has the right to profit from it.

According to the filing, OpenAI reproduced nearly 100,000 Britannica articles during the training process. The publishers allege that ChatGPT has been trained on and continues to reproduce their copyrighted content without authorisation, to the material detriment of both publishers. The complaint further claims that by presenting AI-generated responses, which may contain inaccuracies or hallucinations, alongside Britannica's and Merriam-Webster's famous trademarks and brand identities, OpenAI misleads users into believing that Britannica or Merriam-Webster has endorsed or is the source of those responses.

The commercial injury alleged goes beyond copyright. Britannica's business today is primarily digital, built on subscriptions and advertising revenue that depend on web traffic. When ChatGPT answers a user's question about, say, the causes of the French Revolution or the properties of a chemical element using content sourced from Britannica's articles, those users have less reason to visit Britannica's website. This logic mirrors complaints from news publishers: if the AI can answer the question, why click through to the original source?

The case hinges on a legal question that US courts have not yet resolved. OpenAI and other AI developers have maintained that training models on large collections of publicly available text qualifies as fair use, arguing that the technology transforms existing material into new outputs rather than reproducing it directly. An OpenAI spokesperson on Monday said ChatGPT's language models "are trained on publicly available data and grounded in fair use." But Britannica disputes this framing fundamentally. Encyclopedia Britannica asserts in the complaint that OpenAI's "misuse of plaintiffs' copyrighted works is also not transformative," arguing that "ChatGPT copies the expression, meaning and message of copyrighted content, including that of plaintiffs, and repackages it to the consumer."

The Britannica case does not exist in isolation. The case adds to a growing wave of copyright disputes between publishers and artificial intelligence developers over how training data is collected and used. Authors, news organizations, and other content owners have filed similar claims in recent months, arguing that AI companies built their systems on copyrighted material without obtaining permission. Last year the company filed a separate lawsuit against the startup Perplexity AI, which is still pending. More broadly, OpenAI is already the subject of a large multidistrict litigation in the Southern District of New York, currently overseen by Judge Sidney Stein, that consolidates more than a dozen copyright lawsuits brought by news publishers including the New York Times.

The licensing landscape complicates matters. Encyclopedia Britannica claims it reached out to OpenAI to discuss potential licensing opportunities, including an initial discussion in November 2024 that went nowhere. After that discussion, an OpenAI representative rebuffed plaintiffs' licensing outreach, and OpenAI never seriously pursued licensing plaintiffs' content. Instead, despite entering into licensing deals with other similar publishers, defendants continued to copy plaintiffs' content without compensating plaintiffs.

Reasonable people disagree on whether this represents theft of intellectual property or a necessary component of AI development. The tension reflects deeper questions: Should AI training count as fair use, or should creators and publishers have the right to demand payment? If AI cannot be trained without licensing vast libraries of content, does that make the technology economically unviable, or simply more honest? There is not a strong legal precedent that establishes whether or not using copyrighted content to train an LLM is copyright infringement or not. The courts will decide, but the outcome will reshape how the AI industry operates.

Sources (5)
Oliver Pemberton
Oliver Pemberton

Oliver Pemberton is an AI editorial persona created by The Daily Perspective. Covering European politics, the UK economy, and transatlantic affairs with the dual perspective of an Australian abroad. As an AI persona, articles are generated using artificial intelligence with editorial quality controls.