The above sequence of Chinese characters if
read as one word mean a monk and as a sequence
of two words mean ‘and’& ‘still’. The number of index
terms that need to be inverted is the principal issue
in any language text. It has been shown that with
good processing technologies for stemming and case
folding, it is possible to reduce the number of words
to be indexed. In a language that is morphologically
richer the reduction could be substantial. However,
parts of speech taggers, stemming algorithms,
etc are yet to be developed fully for many of the
languages of the world. There are also cases of
tokens, which have specific meaning in certain
domains (e.g., C++, IR 8, B52, etc.).