I’ve decided to try and pick up a “datadidact” habit, by regularly working with a large dataset. Even if it’s doing lowly basic characterization, this should force me to hone various skills and brush up on some basic knowledge.
Having spoken before of the Discogs.com dataset, their repository would appear to be a treasure trove and is completely unrelated to anything at work to boot. Thought siccing wget on http://www.discogs.com/data/ would be a no-brainer and a quick start. Except it’s blocked to crawlers based upon their robots.txt. ’Twould be nice if the HTTP Response for the data URL actually was more informative than a 500 error, but I can understand where Discogs is coming from.
However, I WILL NOT BE DENIED! Just have to do it tediously by hand through my browser. So be it. The longitudinal analysis possibilities are too intriguing.
Already have some initial data in hand. Not looking forward to dealing with 11Gb XML files, though First item might be to convert the data into a record/line oriented format.
I got stuck in my car for the commute this afternoon, and wound up catching a few segments with the local sports radio yakker. For one chunk, they had this guy Richard Deitsch talking about some
Gary “The Glove” Payton was recently elected into the Naismith Basketball Hall of Fame. I saw a clip right after the announcement and Payton said a couple of interesting things. First, he said he thought of the current crop of players, John Wall had the most potential to equal The Glove defensively. Yow! 
Considering how much
Link parkin’: