home ¦ Archives ¦ Atom ¦ RSS

Common Crawl 2012

Mmmmmmm, fresh, hot data! With instructions to boot:

I am very happy to announce that Common Crawl has released 2012 crawl data as well as a number of significant enhancements to our example library and help pages.

Along with this release, we’ve published an Amazon Machine Image (AMI) to help both new and experienced users get up and running quickly. The AMI includes a copy of our Common Crawl User Library, our Common Crawl Example Library, and launch scripts to show users how to analyze the Common Crawl corpus using either a local Hadoop cluster or Amazon Elastic MapReduce.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.