This may become a recurring aspect of the blog, thus the abbreviation and numbering. I’ve managed to catch up and download the entirety of discogs.com data dump archives to my personal laptop. As of this writing, it’s about 153 Gb of mostly compressed XML data, at varying levels of quality going all the way back to 2008.
I don’t really have much of a plan other than to explore an interesting longitudinal data set. One thing I’m hoping to do is come up with a modernish set of tools to process the data, including normalizing and transforming to other formats. The other goal is to push it all up into Google Cloud Platform and see what working with data in that environment is like. Also, planning to make code and generated data open, since Discogs provides it under an extremely liberal license.