With five multi-label categories and one “others” category, there are (25 -1) + 1 = 32 possible label sets for a tweet. Tables 2 and 3 provide all the evaluation measures under random guessing. The random guessing program first guessed whether a tweet belongs to “others” based on the proportion this category takes in the training data set. If this tweet did not belong to “others”, it then proceeded to guess whether it fell into the rest of the categories also based the proportion each category takes in the rest categories. We repeated the random guessing program 100 times, and obtained the average measures.