Gracenote Sues OpenAI Over Metadata and Database Structure

Gracenote, the Nielsen-owned entertainment metadata company, has filed a federal copyright lawsuit against OpenAI in Manhattan, alleging the AI firm copied not only its database of television and film information but also the proprietary framework that connects that data.

The case introduces a legal dimension notably absent from previous AI copyright disputes. While most lawsuits against AI companies have centred on content used to train large language models, Gracenote is suing OpenAI not just for using its metadata without authorisation or compensation, but also for copying the relational framework it uses to connect its metadata, which is in part what makes the data valuable to its enterprise clients and useful for consumers.

Gracenote maintains one of the entertainment industry's most widely used metadata catalogues, covering music, television, film and sports programming. The company employs hundreds of editors who create narrative descriptions, unique identifiers and other metadata elements designed to help distributors and platforms organise and surface content for viewers. This human curation and the relationships between data points form the core of Gracenote's commercial value to clients such as television providers.

In its complaint, Gracenote points to specific examples where ChatGPT produces descriptions that closely match those created by its editors. Gracenote alleges OpenAI scraped and used a near-exact copy of that descriptor when prompted by a ChatGPT user to describe "Game of Thrones." It provides several other examples where, with minimal prompting, OpenAI's various ChatGPT models recite large portions of Gracenote's program descriptions verbatim.

The company argues OpenAI had legitimate alternatives. "Defendants could have paid Gracenote to license its valuable Gracenote Data. Or they could have sought to train and ground their models only on information in the public domain." Gracenote says in its lawsuit that it reached out to discuss licensing its data to OpenAI "many times over an extended time period" but said the AI company "rebuffed or ignored every single attempt to do so."

OpenAI has rejected the allegations. A company spokesperson stated that the firm's models are trained on publicly available data and rely on fair use principles. This defence echoes arguments advanced in other pending cases where AI companies that scraped online materials to train their models tend to argue that public data is "fair use" under existing copyright law, in part because that data can be used to create new and helpful services or information for consumers.

The lawsuit's significance extends beyond the immediate dispute. To date, there hasn't been a major media copyright lawsuit that focuses on the theft of a proprietary sequence or structure behind a dataset. This lawsuit could set a new precedent for how data providers, in the media industry and outside of it, protect their intellectual property.

The lawsuit seeks statutory damages, which are predetermined penalties for copyright violations, as well as actual damages tied to any financial harm suffered by the company. Gracenote's entire Programs Database, which includes its metadata and the proprietary relational map its editors use to connect that data, is registered with the U.S. Copyright Office. Because of those protections, the company is suing OpenAI for statutory damages, in addition to actual damages.

Gracenote frames its position as compatible with AI development. "Being pro-AI and anti-theft aren't contradictory; they are the only sustainable path forward. We've filed suit to protect that future," Gracenote CEO Jared Grusd said in a statement. The company has licensed its database to technology firms including Samsung and Google, demonstrating that alternative approaches to data sourcing exist.

The case arrives as courts grapple with the broader question of whether AI training on copyrighted material constitutes fair use. Multiple copyright lawsuits are in discovery, and judicial decisions expected this year will influence how AI companies source and use data moving forward. Gracenote's focus on the dataset's underlying structure adds a new dimension to these disputes, potentially affecting how tech firms approach the organisation of proprietary information.