As a first step towards understanding the impact of the
style of the reviews on helpfulness and product sales, we
rely on existing literature of subjectivity estimation from
computational linguistics [41]. Specifically, Pang and Lee [41]
described a technique that identifies which sentences in a
text convey objective information, and which of them contain
subjective elements. Pang and Lee applied their techniques
in a data set with movie review data set, in which they
considered as objective information the movie plot, and as
subjective the information that appeared in the reviews. In our
scenario, we follow the same paradigm. In particular, objective
information is considered the information that also appears in the
product description, and subjective is everything else.
Using this definition, we then generated a training set with
two classes of documents:
• A set of “objective” documents that contains the product
descriptions of each of the products in our data set.
• A set of “subjective” documents that contains randomly
retrieved reviews.