Emoticons as Ground Truth
The accuracy of emoticons as ground truth is questionable
as it was anticipated that sarcasm and other expressions of
mixed emotion will represent noisy datapoints in the system.
Increasing the size of the collected Twitter corpus is
known to help overcome this issue to a limited degree [9, 8].
To examine whether the presence of emoticons accurately
re
ects the emotional nature of the text they accompany,
a set of human-annotated tweets was compiled. 250 positive
and 250 negative tweets were randomly selected and removed
from the main Twitter database. A group of 3 annotators
assigned the labels `positive', `negative', and `ambiguous/
neutral' to the 500 tweets. Any emoticons and hashtags
were absent from the tweets as presented to the annotators,
as the intention was to classify based on the language in
the tweet alone. Table 2 shows the confusion matrix for
the result. The large diagonal elements indicate that emoticons
are a good indicator of emotional content. However
about 30% of the tweets were noted to be ambiguous or