I sort of already knew of Pete Warden’s Data Science Toolkit, but Ajay Ohri gives a nice overview of what’s actually inside:
The Data Science Toolkit is a collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained Vagrant VM or EC2 AMI that you can deploy yourself.
The Data Science Toolkit is essentially a specialized Linux distribution, with a lot of useful data software pre-installed and exposing a simple interface. Developer documentation is quite nicely done.