
Photo Credit: Stefan Steinbauer
Despite securing a deal with Universal Music Group (UMG) earlier this month to cement its place as a “responsible AI” partner, chip-making giant NVIDIA has been accused of training its AI on data scraped by Anna’s Archive, the so-called notorious pirate website.
On Friday, a class action lawsuit filed against NVIDIA back in 2024 by several authors who claimed the company’s AI models were illegally trained on their works was amended, vastly expanding the scope of the litigation. The amended lawsuit now includes more books, authors, and infringing AI models, as well as claims involving the controversial “shadow library” Anna’s Archive.
“Desperate for books, NVIDIA contacted Anna’s Archive—the largest and most brazen of the remaining shadow libraries—about acquiring its millions of pirated materials and ‘including Anna’s Archive in pre-training data for our LLMs’,” the filing reads. “Because Anna’s Archive charged tens of thousands of dollars for ‘high-speed access’ to its pirated collections […] NVIDIA sought to find out what ‘high-speed access’ to the data would look like.”
“Within a week of contacting Anna’s Archive, and days after being warned by Anna’s Archive of the illegal nature of their collections, NVIDIA management gave ‘the green light’ to proceed with the piracy. Anna’s Archive offered NVIDIA millions of pirated copyrighted books,” the complaint continues, stating that Anna’s Archive promised to provide the company with around 500 terabytes of pirated data.
And all of that comes just weeks after UMG announced a partnership with NVIDIA to “pioneer responsible AI for music discovery, creation, and engagement.”
Presumably, UMG had no idea NVIDIA may not have sourced its data ethically, but now the company stands between a rock and a hard place if it wants to ensure its offerings aren’t trained on or derived from pirated works. Whether UMG will call off its collaboration with the chip maker remains to be seen.