Commercial web search engines incorporate hundreds of features (types of evidence)
in their ranking algorithms, many derived from the huge collection of user
interaction data in the query logs. These can be broadly categorized into features
relating to page content, page metadata, anchor text, links (e.g., PageRank), and
user behavior. Although anchor text is derived from the links in a page, it is used
in a different way than features that come from an analysis of the link structure
of pages, and so is put into a separate category. Page metadata refers to information
about a page that is not part of the content of the page, such as its “age,” how
often it is updated, the URL of the page, the domain name of its site, and the
amount of text content in the page relative to other material, such as images and
advertisements.