all the experiments. We also used p ¼ n ¼ 5 for the self-training algorithm and selected one unlabelled example in each
cycle of active learning for the manual labelling. The total number of iterations was set to 50 iterations for all iterative
algorithms. This setting meant that, in total, 50 unlabelled examples were selected for manual labelling during the learning
process and 500 unlabelled examples were labelled automatically. After a full learning process, the test data were presented
to the learned classifier for evaluation. Table 2 shows the comparison results after the full learning process. As we can see,
our proposed method showed a better performance in all datasets, especially with regard to accuracy.
Because active learning based models benefit from the information of 50 manually-labelled examples during their
learning process, we added extra labelled samples (50 samples) from the source language to the training sets of the SCL
and SVM-MT models in order to create the same condition for all comparing models. By comparing all semi-supervised
and active learning based methods with SVM-MT model in Table 2, we can conclude that the incorporation of unlabelled
data from the target language into the learning process can effectively improve the performance of cross-lingual sentiment
classification. Also, as we can see in this table, DBAST and AST models demonstrate better performance in comparison to AL
and ST after the full learning process. This supports the idea that the combination of active learning and self-training processes
can result in a better classification than each individual approach. Moreover, the DBAST model outperforms the
AST model in all datasets. This shows that using the density measure of unlabelled examples has a beneficial effect upon
selecting the most representative examples for manual labelling.
As we can see, different languages show different accuracies in cross-lingual sentiment classification. This difference can
be interpreted from two points of views. First, sentiment classification shows diverse performance in different languages due
to the disparity in the structure of languages when expressing the sentimental data even in the same domain (e.g. book
review domain). Next, automatic machine translation systems perform translation of varying quality in different languages