home ¦ Archives ¦ Atom ¦ RSS

Python, Gensim, LDA

Link parkin’: Gensim is a Python framework for vector space modeling. This means taking a corpus of text documents, where document means any bag of symbols from a fixed vocabulary, turning the documents into a vector representation and then discovering latent structure within the corpus. Good for unsupervised document analysis.

I’ve been wondering for a long time about the specific implementation of a vector space model algorithm known as Latent Dirichlet Allocation or LDA. The only LDA implementations I’ve seen previously are in C++ and Java and I can’t seem to grok how they translate the LDA math into code. Maybe the Python version in Gensim will be a bit more illuminating.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.