In working with some big piles of data at work, I’m learning some lessons that veterans like Matthew Hurst have already internalized:
Data agility is achieved when the system is designed to either be independent of certain types of qualities of the input data, or when there are well defined processes, tests and analytical tools that radically reduce the time from identifying a new data source to shipping it in production.
System agility is not data agility, and aiming for data agility requires an upfront investment in tools specifically for that purpose.
I came to a similar mind about data agility when I realized that much of my recent programming was well nigh trivial. How to organize bits of simple code into a robust, manageable pipeline lit up the old brain cells though. Also, data gardening and grooming is a key part of that pipeline.