Semi-supervised learning [24] is a well-known technique that makes use of unlabelled data to improve classification performance.
One of the most commonly used semi-supervised learning algorithms is that of self-training. This technique is an
iterative process. Semi-supervised self-training tries to automatically label examples from unlabelled data and add them to
the initial training set in each learning cycle. The self-training process usually selects high confidence examples to add to the
training data. However, if the initial classifier in self-training is not good enough, there will be an increased probability of
adding examples having incorrect labels to the training set. Therefore, the addition of ‘‘noisy’’ examples not only cannot
increase the accuracy of the learning model, but will also gradually decrease the performance of the classifier. On the other
hand, self-training selects most confident examples to add to the training data. But these examples are not necessarily the
most informative instances (especially for discriminative classifiers, like SVM) for classifier improvement [16]. To solve these
problems, we combine the processes of self-training with active learning in order to enrich the initial training set with some
selected examples from unlabelled pool in the learning process. Active learning tries to select as few as possible the most
informative examples from unlabelled pool and label them by a human expert in order to add to the training set in an
iterative process. These two techniques (self-training and active learning) complement each other in order to increase the
performance of CLSC while reduce human labelling efforts.