3.8 Removing Noise
Many web pages contain text, links, and pictures that are not directly related to the main content of the page. For example,
Figure 3.16 shows a web page containing a news story.
The main content of the page (the story) is outlined in black.
This content block takes up less than 20% of the display area of the page, and the rest is made up of banners, advertisements, images, general navigation links, services (such as search and alerts), and miscellaneous information, such as copyright.
From the perspective of the search engine, this additional material in the web page is mostly noise that could negatively affect the ranking of the page.
A major component of the representation of a page used in a search engine is based