The Scientific Practice of Large Scale Data Analytics
At work I’m seeing too many people getting a Heap ’O Data (TM), and then not being systematic about how they manage, process, and analyze those precious bits. Your’s truly is a culprit, but I’m trying to get better. Even a basic primer on how to document your data sets would be helpful.
I have to imagine the Business Intelligence and Data Warehouse guys must have some recorded literature along with the DevOps, Scientific Computing, and Quantitative Finance communities. Probably where I need to start diggin’. Really, this process can’t be as haphazard as I’m seeing on a daily basis.
Feels like a good opportunity for O’Reilly Media