The first algorithm we discussed,
based on the distribution of tags, is quite effective for web pages with a single content block.
Algorithms that use the DOM structure and visual analysis can deal with pages that may have several content blocks.
In the case where there are several content blocks, the relative importance of each block can be used by the indexing process to produce a more effective representation.
One approach to judging the importance of the blocks in a web page is to train a classifier that will assign an importance category based on visual and content features (R. Song et al., 2004).