(especially when we want to use the translation results to analyse the sentiment polarity of a text). Therefore, the performance
of cross-lingual sentiment classification is different in different languages.
In order to assess whether there are significant differences in terms of accuracy between the proposed model and baseline
methods, we conducted a statistical test based on accuracy results obtained from a 5-fold cross-validation. We used a paired
t-test to evaluate whether differences between the two methods are statistically significant. Table 3 shows the numerical
results of the statistical test. With the exception of those between the DBAST and AST models in the En–Ch dataset and
between DBAST and AL in the En–Jp dataset, all other comparisons showed statistically significant differences, for a significant
level of a = 0.05.
Fig. 3 shows the classification accuracy of various active learning based methods on the three evaluation datasets. As
shown in this figure, by comparing the proposed method (DBAST) with the AST model, the classification accuracy of the proposed
model improved very quickly in the first few cycles (especially in the French language). This was because the examples
selected based on density and uncertainty were more representative than those examples selected based solely on uncertainty
in active learning. The results presented in this figure also showed that the combination of active learning with
self-training helped to obtain better performance. This was most likely due to the augmentation of the most confident automatic
classified examples, along with the manually labelled examples, into the training data during the learning process.