There are a number of steps involved with building such a system, each of
which involves some form of classification. First, when crawling and indexing
sites, the system has to automatically classify whether or not a web page contains
a review or if it is a blog posting expressing an opinion about a product. The task
of identifying opinionated text, as opposed to factual text, is called opinion detection.
After a collection of reviews and blog postings has been populated, another
classifier must be used to extract product names and their corresponding
reviews. This is the information extraction task. For each review identified for a
given product, yet another classifier must be used to determine the sentiment of
the page. Typically, the sentiment of a page is either “negative” or “positive”, although
the classifier may choose to assign a numeric score as well, such as “two
stars” or “four stars”. Finally, all of the data, including the sentiment, must be aggregated
and presented to the user in some meaningful way. Figure 9.9 shows part
of an automatically generated product review from a web service.This sentimentbased
summary of various aspects of the product, such as “ease of use”, “size”, and
“software”, is generated from individual user reviews