In recent years, research in sentiment classification has received considerable attention by
natural language processing researchers. Annotated sentiment corpora are the most important
resources used in sentiment classification. However, since most recent research works
in this field have focused on the English language, there are accordingly not enough annotated
sentiment resources in other languages. Manual construction of reliable annotated
sentiment corpora for a new language is a labour-intensive and time-consuming task.
Projection of sentiment corpus from one language into another language is a natural solution
used in cross-lingual sentiment classification. Automatic machine translation services
are the most commonly tools used to directly project information from one language into
another. However, since term distribution across languages may be different due to variations
in linguistic terms and writing styles, cross-lingual methods cannot reach the performance
of monolingual methods. In this paper, a novel learning model is proposed based on
the combination of uncertainty-based active learning and semi-supervised self-training
approaches to incorporate unlabelled sentiment documents from the target language in
order to improve the performance of cross-lingual methods. Further, in this model, the
density measures of unlabelled examples are considered in active learning part in order
to avoid outlier selection. The empirical evaluation on book review datasets in three different
languages shows that the proposed model can significantly improve the performance of
cross-lingual sentiment classification in comparison with other existing and baseline
methods.