Ben Lorica posts some observations from the recent PyData conference:
It’s getting easier to use the Python data stack:
There are tools that facilitate the dissemination and sharing of code and programming environments. IPython2 notebooks allow Python code and markup in the same document. Notebooks are used to record and share complex workflows and are used heavily for (conference) tutorials. As the data stack grows, one of the major pain points is getting all the packages to work properly together (version compatibility is a common issue). In particular setting up environments were all the pieces work together can be a pain. There are now a few solutions that address this issue: Anaconda and cloud-based Wakari from Continuum Analytics, and cloud computing platform PiCloud.
One of the things I find most exciting is the quality incorporation of Python into the Hadoop ecosystem and varied spinoffs. While Java will always be on the top of the depth chart, since it’s the implementation language and Java is so enterprisey, Python is a reasonably first class citizen. I’ve gone a decent ways building Hadoop processing pipelines with essentially no Java programming.