This could be just the first of many future class-action lawsuits exposes unauthorized book copying and storage for GenAI training.
In a major turning point for digital copyright law, book authors have secured a first-of-its-kind US$1.5bn settlement in their class-action lawsuit targeting the use of pirated books to train AI systems.
Pending judicial approval in San Francisco, the payout — by far the largest copyright recovery in US history — will award approximately US$3,000 per work for around 500,000 books that were deemed to have been inappropriately acquired and incorporated into machine learning datasets.
The historic agreement arrives amid a wave of lawsuits against technology firms, and is seen by legal experts as a landmark that could significantly affect the sourcing and licensing practices underpinning advanced AI development.
The dispute began after writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson discovered that their books, along with countless others, had been downloaded from shadow libraries, then used to build AI chatbots without the authors’ permission.
Investigations have revealed that these works had been amassed into a “central library” containing over seven million digitized books — most of which had been illegally copied. Although a federal judge has found that training AI on legally purchased literature may qualify as fair use, storing and mining vast troves of pirated works crossed clear legal boundaries, and prompted the need for redress.
Potential damages could have soared into the hundreds of billions had the matter gone to trial, motivating the defendant, Anthropic, to agree to settle, destroy illicitly obtained files, and compensate writers.
While the deal resolves the current infringement claims, it stops short of licensing future uses — authors retain the right to challenge AI firms if their works are reused without authorization.
As judicial scrutiny intensifies and similar cases proceed against OpenAI, Meta, Microsoft, and others, the settlement is widely expected to catalyze broader reforms, new licensing arrangements, and decisive guidance for the ethical training of AI models.