One of the main application of PCA is for preprocessing the data before
performing a classication, we will now study the classication accuracy of
a k-NN classier on a biomedical dataset (colon-cancer, see the website).
The data consists of 2 variables x and y, where x is an n m data matrix
and y is a column vector representing labels of the data.
Question 2.1
Use the given k-NN code to classify the given high-dimensional dataset.
First, perform the classication task without using the dimensionality
reduction technique, record the classication error.
Secondly, use the rule,
ำm
ำi=k+1 i m
i=1 i
to nd a good subspace in which you will project the data into. Report
the dimensionality of the subspace selected and the classication error
of k-NN in such space.
Discuss your results.
Tips: you only need to call the function myknn(x,y) where the rst pa-
rameter is the data matrix (or the reduced data matrix) and y is a set of
labels.
3 Submission
Send the report and the codes to jakramate.b@cmu.ac.th
2