The collected dataset is used to extract features that will be used to train our sentiment classifier. We used the presence
of an n-gram as a binary feature, while for general information retrieval purposes, the frequency of a keyword’s occurrence
is a more suitable feature, since the overall sentiment may not necessarily be indicated through the repeated use
of keywords. Pang et al. have obtained better results by using a term presence rather than its frequency (Pang et al.,
2002).