we see that when the probability threshold value is 0.4, the performance is generally better than under other threshold values. Table 3 shows the label-based accuracy (12) and F1 measure (15) for each of the six category when T = 0.4 compared with random guessing.