Classification and Results
The different features are combined and evaluated using
three classifiers. For classification, the machine learning library
Weka (Witten and Frank 2005) is used. We compare a
Na¨ıve Bayes Binary Model (NBB), the Ripper rule learner
(JRip), and a classifier based on a Support Vector Machine
(SVM). Our classification results are calculated using stratified
10-fold cross validation on the training set. To measure
the performance of the classification approaches, we report
the accuracy (Acc), the averaged precision (Prec), the averaged
recall (Rec), and the F-Measure (F).
In Table 1 the classification results are shown.With 82.2%
precision and 82% recall on a 4-class problem, we outperform
current state-of-the-art approaches for small scale incident
detection. Thus, our study demonstrates that it is possible
to detect potentially valuable information in the stream
of microblogs with high precision and recall, even though
the absolute amount of information related to incidents is
low.