Cancer classification from microarray gene expression
data is a challenging task in computational biology
and bioinformatics as the sufficient number of labeled samples
(required to train the traditional classifiers) are very expensive
and difficult to collect. Therefore, the predication accuracies of the
classifiers trained with limited training samples are often very low.
Although, the unlabeled samples are relatively inexpensive and
readily available, traditional classifiers not generally utilize the
distribution of those unlabeled samples. In this context, this article
presents a novel ‘self-training’ based semi-supervised classification
method using fuzzy K-Nearest Neighbour algorithm which
utilizes the unlabeled samples along with the labeled samples
to improve the prediction accuracy of the cancer classification.
The proposed method is evaluated with a number of microarray
gene expression cancer data sets. Experimental results justify
the potentiality of the proposed semi-supervised method for
cancer classification using microarray gene expression data in
comparison to its other supervised counterparts.