language. The framework of the proposed model is illustrated in Fig. 1. In the learning phase, a base classifier is trained based
on the initial training data and then applied to the translated unlabelled data. From the newly classified unlabelled data,
active learning selects the most informative example (the most useful examples which can help to improve the classifier)
for human labelling. In human labelling process, a native speaker in the target language can read the review and evaluate
the overall sentiment polarity (e.g. positive sentiment or negative sentiment) of that review. Simultaneously, self-training
selects some of the most confident classified examples with the corresponding predicted labels. These selected examples
are added to the training set for the next learning cycle. In the next cycle, the model is retrained based on the augmented
training data and this process is repeated until a termination condition is satisfied. Further, in this model, the density of
the unlabelled data is also considered in the selection of not only more informative, but also more representative, unlabelled
instances in order to avoid outlier selection in active learning. In the test phase, the final trained classifier is applied to the
translated test data for the classification task. We called this model ‘‘density-based active self-training’’ (DBAST). Active
learning and self-training are described in detail in the following sections, respectively.