on word counts,
and the presence of a large number of words unrelated to the main topic can be a problem.
For this reason,
techniques have been developed to detect the content blocks in a web page and either ignore the other material or reduce its importance in the indexing process.