The pLSI model
itself has a problem in that its generative semantics are not
well-defined; thus there is no natural way to predict a
previously unseen document, and the number of parameters of
pLSI grows linearly with the number of training documents,
which makes the model susceptible to overfitting [12].