One of the reasons I’ve become interested in deep analytical dives of podcasts is because of Software Engineering (SE) Radio. It’s not quite at the top of my podcast fave lists because I don’t feel an urge to listen to it regularly. However, the content is typically excellent and I like the no nonense interview style. A little topic introduction, a little guest bio, and then right into the technical discussion. No ads either.
The analytic draw is the historical depth of the content. SE Radio launched back in 2006, 16 years ago. There are 511 episodes as of this writing. It’s provided under a Creative Commons 2.5 License. There are so many summarization possibilities to apply and questions to be answered of such a corpus. Think of all the tech topics that have come up over the years. Not to mention that there are some exceedingly distinguished guests making appearances. How have the themes and personalities ebbed and flowed over the years? What programming languages have ascended and waned? Are there any enduring repeat offenders? How does this all connect and intersect with the wider tech topic podcast ecosystem?
And that’s just a single long lived podcast! Between The Python Podcast.__Init__ (363) and the Data Engineering Podcast (288), Tobias Macey has delivered 651 episodes, primarily by himself. For almost any vertical topic you can imagine, there are probably at least a decent handful of well rooted podcasts worth deep diving on.
I’ll need to get a good lawyer to interpret the license, but the only show stopper might be the legality of building publicly available systems that derive from analytics of the podcast episodes. On the one hand, I feel like this might be like podcast search indexing which would seem to not be a derivative work. On the other, you are building on the content sooo…
Now whether all of the episodes can actually be collected is a technical challenge for another day.