on word counts, and the presence of a large number of words unrelated to the main topic can be a problem.
For this reason, techniques have been developed to detect the content blocks in a web page and either ignore the other material or reduce its importance in the indexing process.