
Photo Credit: Kevin Horvat
Update (12/22): After this piece was published, a Spotify spokesperson reached out with a statement confirming that the responsible user accounts had been “identified and disabled,” with “new safeguards” implemented as well.
“Spotify has identified and disabled the nefarious user accounts that engaged in unlawful scraping,” the spokesperson said. “We’ve implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior. Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights.”
Below is our original coverage.
The allegedly responsible hackers, part of a self-described “non-profit project” called Anna’s Archive, themselves disclosed the data heist in a blog post. And that lengthy post, drawing from the metadata, covers hard stats concerning duration, stream volume, popularity, genre, release date, and more.
Regarding straight audio, Anna’s Archive indicated that it’d “archived around 86 million music files, representing around 99.6% of listens” and clocking in at “a little under 300TB in total size.”
“A while ago, we discovered a way to scrape Spotify at scale… For now this is a torrents-only archive aimed at preservation, but if there is enough interest, we could add downloading of individual files to Anna’s Archive,” the hackers communicated.
(Technically, Anna’s Archive claims that it doesn’t “host any copyrighted materials,” instead purportedly indexing “metadata that is already publicly available.” Direct hosting or not, some of the project’s supporters are lamenting the Spotify circumvention – and the possibility that it’ll “ruin the actual important literary archive” by encouraging aggressive litigation.)
“The data is circulating on P2P networks, and there is no putting this back in Pandora’s box,” Zimmerman wrote. “Anyone can now, in theory, create their own personal free version of Spotify (all music up to 2025) with enough storage and a personal media streaming server like Plex. The only real barriers are copyright law and fear of enforcement.”
“It is well understood that LLMs thrive on high-quality data,” one section of the Anna’s Archive site reads. “We have the largest collection of books, papers, magazines, etc in the world, which are some of the highest quality text sources.”
According to the same site, Anna’s Archive promptly put out the metadata, with the 300 terabytes’ worth of audio files “releasing in order of popularity.”
In other words, the full extent of the episode’s fallout remains to be seen. And as initially mentioned, Spotify confirmed the “unauthorized access” (but not where things go from here) in a detail-light statement.
“An investigation into unauthorized access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform’s audio files. We are actively investigating the incident,” the Spotify spokesperson said.