home ¦ Archives ¦ Atom ¦ RSS

Discogs Data micro-Redux

So I’ve said in the past that I wanted to investigate the Discogs.com data dumps as a side project. I’ve basically failed, other than to keep collecting the data dumps continuously. To date, I believe there are 459 data files, totaling about 380+ Gb of compressed data. My finger in the wind estimate is that the uncompressed total is near about 4 Tb.

Just going to keep trying to take baby steps to build momentum on this. The available dataset now spans over a decade, which makes it interesting in its own right almost independent of what’s in the dumps. But it’s challenging because of data dirt from the early years and data scale recently. Just getting what I think is an accurate listing and count of the file dumps was surprisingly difficult to generate.

Aspiring to be to Discogs data what Mark Litwintschik is to the 1.1 Billion Taxi Rides data.

© C. Ross Jam. Built using Pelican. Theme by Giulio Fidente on github.