where qj is the initial weight of query term j, Rel is the set of identified relevant
documents, Nonrel is the set of non-relevant documents, |.| gives the size of a
set, dij is the weight of the jth term in document i, and α, β, and γ are parameters
that control the effect of each component. Previous studies have shown that
the set of non-relevant documents is best approximated by all unseen documents
(i.e., all documents not identified as relevant), and that reasonable values for the
parameters are 8, 16, and 4 for α, β, and γ, respectively.
This formula modifies the query term weights by adding a component based
on the average weight in the relevant documents and subtracting a component
based on the average weight in the non-relevant documents. Query terms with
weights that are negative are dropped. This results in a longer or expanded query
because terms that occur frequently in the relevant documents but not in the original
query will be added (i.e., they will have non-zero positive weights in the modified
query). To restrict the amount of expansion, typically only a certain number
(say, 50) of the terms with the highest average weights in the relevant documents
will be added to the query.