REPRESENTING THE INDICES
Log frequencies. The raw frequencies can be transformed using the log function This transformation would “dampen” the raw frequencies and how they affect the results of subsequent analysis.
f(wf) = 1 + log(wf) for wf > 0
In the formula, wf is the raw word (or term) frequency and f(wf) is the result of the log transformation.
This transformation is applied to all of the raw frequencies in the TDM where the frequency is greater than zero.
Binary frequencies. Likewise, an even simpler transformation can be used to enumerate whether a term is used in a document.
F(wf) = 1 for wf > 0