7.8 WEB CONTENT MINING AND WEB STRUCTURE MINING
Web content mining refers extraction of useful information from Web pages. The document may be extracted in some machine-readable so that automated techniques can generate some information about the Web pages.
The collective endorsement of a given page by different developers on the Web may indicate the importance of the page and may naturally lead to the discovery of authoritative Web pages (Miller, 2005).
The structure of web hyperlinks has led to another important category of web pages called a hub. A hub is one or more web pages that provide a collection of links to authoritative pages.
Web Structure mining is the process of extracting useful information from the links embedded in web document. It is used to identify authoritative pages and hubs, which are the cornerstones of the contemporary page-rank algorithms that are central to popular search engines such as Google and yahoo!