After this we count the frequency of each word and the word having frequency above a threshold (based on a formula consisting file size) is selected as an index term. Collection of all such terms creates our index table (document representative) for that document