A secret initiative code-named Project Panama saw artificial intelligence company Anthropic spend tens of millions of dollars buying, destroying and scanning physical books. The operation aimed to build a massive digital library for training its Claude chatbot. Court documents obtained by The Washington Post revealed the scope of this previously undisclosed endeavor.
The project involved using industrial machinery to remove book spines and scan pages before recycling the remains. Federal court filings from a copyright lawsuit against Anthropic show the company described it internally as an effort to “destructively scan all the books in the world.” The documents included a telling instruction about the operation’s secrecy: “We don’t want anyone to know we are working on this.”
A federal judge authorized the release of over 4,000 pages of documents. These materials stem from a lawsuit filed by writers accusing Anthropic of copyright infringement. The company agreed to pay $1.5 billion to settle the litigation without admitting liability.

Internal Emails Reveal AI Training Strategy
The documents date from 2024 and detail how Anthropic built its training data infrastructure. In January 2023, one of Anthropic’s co-founders wrote in an internal document that training models on books could teach them to “write well.” He contrasted this with what he called lower quality language found on the internet.
Internal Meta emails revealed in separate court proceedings described access to large digital book collections as “essential” for competing in the AI market. Companies faced a major obstacle though. Obtaining licenses directly from authors and publishers proved complex and expensive.
According to the lawsuits, several companies opted to acquire books on a massive scale without authorization from rights holders. Questionable practices included downloading from pirated digital libraries such as LibGen and Pirate Library Mirror.

The documents show Ben Mann, Anthropic’s co-founder, personally downloaded an extensive book collection from LibGen over 11 days in June 2021. A year later he shared with colleagues the launch of Pirate Library Mirror. That platform openly claimed to violate copyright laws in numerous countries.
Anthropic maintains it never used those collections to train a complete commercial model that generated revenue. The company also states Pirate Library Mirror was not used to train an integrated system.
When the company decided to abandon its reliance on pirated digital libraries and create its own repository, it launched Project Panama. To lead the effort, Anthropic hired Tom Turvey, a Silicon Valley veteran who had participated in creating Google Books two decades ago.
Industrial Scale Book Destruction Operation
The company evaluated several options including bulk purchases from second-hand bookstores and potential agreements with American public libraries. Anthropic eventually acquired millions of books through specialized distributors like Better World Books and World of Books.
While court documents hide exact figures, one business proposal included in the record shows the company aimed to digitize between 500,000 and 2 million books in just six months. The process involved using a “hydraulic cutting machine” to carefully separate pages from each copy.
Pages were then processed through high-speed industrial scanners. Once digitization was complete, recycling companies removed the physical remains. The court documents do not explain why Anthropic chose the name Panama for the project.

The revelation comes amid a wave of lawsuits against artificial intelligence companies filed by writers, artists, photographers and media outlets. Google, Meta, Microsoft and OpenAI all face similar claims over using protected works to train AI systems.
Several federal judges in the United States have issued preliminary rulings favorable to technology companies regarding model training. In June, Judge William Alsup concluded that using books to train artificial intelligence could be considered “transformative” use. He compared the process to how a student learns to write by studying existing works.
The judge did distinguish between the use of books and how they were obtained. While he considered the training itself legal, the Copyright infringement artificial intelligence training questions raised by the acquisition methods remain unresolved by the courts.
The Project Panama book scanning operation represents one of the most aggressive known efforts by an AI company to build training data from physical sources. Industry observers note the approach raises significant questions about intellectual property rights in the age of artificial intelligence.

