KNN [5] is one of the most widely used lazy learning approach [1]. Given a set of
n training examples, upon receiving a new instance to predict, the kNN classifier
will identify k nearest neighboring training examples of the new instance and
then assign the class label holding by the most number of neighbors to the new
instance.
The asymptotic classification error of kNN tends to the optimal Bayes error
rate as k→∞ and k/n → 0 when n grows to infinity, and the error is bounded
by approximately twice the Bayes error if k = 1 [6]. This behavior in asymptotic
classification performance combining with the simplicity in concept and implementation,
makes kNN a powerful classification approach capable of dealing with
arbitrarily complex problems, provided there is a large training data set. However,
the theoretical behavior can hardly be obtained because kNN is sensitive
to outliers and noise contained in the training data set, which usually occurs
in real-world applications. Therefore, it is important to eliminate outliers in the
training data set and make other necessary cleaning. The approaches devoting
to this purpose are referred to as editing approaches [6].
KNN [5] is one of the most widely used lazy learning approach [1]. Given a set ofn training examples, upon receiving a new instance to predict, the kNN classifierwill identify k nearest neighboring training examples of the new instance andthen assign the class label holding by the most number of neighbors to the newinstance.The asymptotic classification error of kNN tends to the optimal Bayes errorrate as k→∞ and k/n → 0 when n grows to infinity, and the error is boundedby approximately twice the Bayes error if k = 1 [6]. This behavior in asymptoticclassification performance combining with the simplicity in concept and implementation,makes kNN a powerful classification approach capable of dealing witharbitrarily complex problems, provided there is a large training data set. However,the theoretical behavior can hardly be obtained because kNN is sensitiveto outliers and noise contained in the training data set, which usually occursin real-world applications. Therefore, it is important to eliminate outliers in thetraining data set and make other necessary cleaning. The approaches devotingto this purpose are referred to as editing approaches [6].
การแปล กรุณารอสักครู่..
