The bag of words (BoW) representation
was used, with 10,000 keywords and word stems selected
from the online lexicons LIWC and Harvard General Inquirer,
and were reduced to a 125 dimensional feature vector
representing dierent emotional categories. Early fusion was
employed with utterance-level acoustic features, and classi-
ed using an RBF-kernel SVM. The addition of linguistic