When text is modeled as a finite sequence of words, where at each point in
the sequence there are t different possible words, this corresponds to assuming
a multinomial distribution over words. Although there are alternatives, multinomial
language models are the most common in information retrieval. One of the
limitations of multinomial models that has been pointed out is that they do not
describe text burstiness well, which is the observation that once a word is “pulled
out of the bucket,” it tends to be pulled out repeatedly.