Abstract—This paper introduces a new weighting scheme in
information retrieval. It also proposes using the document
centroid as a threshold for normalizing documents in a document
collection. Document centroid normalization helps to achieve
more effective information retrieval as it enables good
discrimination between documents. In the context of a machine
learning application, namely unsupervised document indexing
and retrieval, we compared the effectiveness of the proposed
weighting scheme to the ‘Term Frequency – Inverse Document
Frequency’ or TF-IDF, which is commonly used and considered
as one of the best existing weighting schemes. The paper shows
how the document centroid is used to remove less significant
weights from documents and how this helps to achieve better
retrieval effectiveness. Most of the existing weighting schemes in
information retrieval research assume that the whole document
collection is static. The results presented in this paper show that
the proposed weighting scheme can produce higher retrieval
effectiveness compared with the TF-IDF weighting scheme, in
both static and dynamic document collections. The results also
show the variation in information retrieval effectiveness that is
achieved for static and dynamic document collections by using a
specific weighting scheme. This type of comparison has not been
presented in the literature before.