We did not run the random guessing program here,
because as shown in Table 2, example-based measures
resulting from random guessing on 32 possible label sets is
smaller than 0.05. In addition, it can be easily proved that
for label-based precision measure (13) as in Fig. 6, the random
guessing precision for any category equals to the actual
number of tweets in this category divided by the total number
of tweets in the entire collection. In this case, there are
less than 5 percent of tweets in the Purdue data set that fall
into the five engineering student problems. Therefore, random
guessing precision is smaller than 0.05. As of now, the
performance of the detector is not superior, but has
achieved significant improvement from random guessing
baseline. As illustrated in the workflow in Fig. 1, the performance
of the algorithm can be gradually improved based
on further human feedback.