Prior Work on IEMOCAP
Rozgic et al. [10] classied the speech found in IEMOCAP
on four discrete emotion categories, using both acoustic and
linguistic features. The bag of words (BoW) representation
was used, with 10,000 keywords and word stems selected
from the online lexicons LIWC and Harvard General Inquirer,
and were reduced to a 125 dimensional feature vector
representing dierent emotional categories. Early fusion was
employed with utterance-level acoustic features, and classi-
ed using an RBF-kernel SVM. The addition of linguistic