Can’t believe I’ve never posted about GH Archive
Open-source developers all over the world are working on millions of projects: writing code & documentation, fixing & submitting bugs, and so forth. GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
There’s a solid 10+ years of freely available GitHub spewed JSON to practice data spelunking, system benchmarking, and query hacking against.
Update: The CNCF DevStats project puts that data to interesting use through application of actual CNCF projects. So meta!
This is a toolset to visualize GitHub archives using HA Postgres databases and Grafana dashboards. Everything is open source so that it can be used by other CNCF and non-CNCF open source projects. The only requirement is that project must be hosted on a public GitHub repository/repositories. Project is deployed using Equinix bare metal Kubernetes nodes and deployed using a Helm chart. It uses many more CNCF projects under the hood.
At least Google couldn’t find any such post on my site. Maybe that’s a hint to implement some real local search. 🤣