More formally, the LDA process for generating a document is:
1. For each document D, pick a multinomial distribution θD from a Dirichlet
distribution with parameter α.
2. For each word position in documentD:
a) Pick a topic z from the multinomial distribution θD.
b) Choose a word w from P(w|z, β), a multinomial probability conditioned
on the topic z with parameter β.
A variety of techniques are available for learning the topic models and the θ
distributions using the collection of documents as the training data, but all of
these methods tend to be quite slow. Once we have these distributions, we can
produce language model probabilities for the words in documents: