classified as illegal will be analysed in a laboratory and then findlegal. On the contrary the percentage illegal samples classified aslegal should be as low as possible in order to limit the number ofillegal samples coming to the market.4.2.1. k-NNFor the data set in which the samples are divided in five classesthe best k-NN model was obtained using three nearest neighbourswith the Euclidean distance as similarity parameter. For the trainingset a cross validation error, evaluated with 10-fold cross validationof 0.4177 was obtained. This corresponds to a correct classifica-tion rate (ccr) of 70 samples of the 120. The ccr is low, though itwas observed that only 16 illegal samples were classified as legal.This corresponds to 24% of all illegal samples in the training set.Evaluation with the external test set showed that 20 of the 30 sam-ples in the external test set were classified correctly. From the 10remaining samples 4 illegal samples (one sample of respectivelyclass 1 and 3 and 2 samples of class 2) were unclassified and 6 weremisclassified. i.e. 1 sample of class 2 was misclassified as class 5, 1sample of class 4 was misclassified as class 3 and finally 4 samplesof class 5 were misclassified as class 2. These results are acceptablesince only one illegal samples was classified as legal.When modelling is repeated considering only the binary classi-fication legal/illegal a cross validation error of 0.3167 is obtained,corresponding to a correct classification of 82 samples of the 120 inthe training set. It was also seen that, as with the previous model, 16illegal samples were classified as legal. Evaluation with the externaltest set showed that all illegal samples were classified as illegal andthat 4 legal samples were classified as illegal. This means that thebinary k-NN model has a good performance for the discriminationof illegal samples.4.2.2. PLS-DAWith PLS-DA the best performing model for the five class classi-fication problem was obtained using nine PLS factors. The selectionof the optimal number of factors was performed using LOOCV andresulted in a cross validation error of 0.3667 or 76 samples of 120classified correctly. 16 of the 53 legal samples in the training setwere classified as belonging to one of the illegal classes, while 22illegal samples were classified as legal. When performing the evalu-ation with the external test set 22 of the 30 samples were classifiedcorrectly. This can be considered as a good ccr. When examining themisclassifications it was observed that a sample of class 2 and oneof class 3 were classified as legal (class 5), a sample of class 4 wasclassified as class 2 and five samples of the legal class were clas-sified in one of the illegal classes. This means that only two illegalsamples were considered as legal.Applying PLS-DA to the two class problem resulted in an optimalmodel of seven PLS factors. The model showed a cross validationerror of 0.2833 or 86 of the 120 samples classified correctly. Fromthe 34 samples misclassified, 13 illegals were classified as legal.Evaluation with the external test set showed two illegal samplesclassified as legal and 7 legals as illegal.4.2.3. SIMCAThe optimal SIMCA model for the five class classification prob-lem was obtained using two principal components to model classes1, 3 and 4 and three principal components for classes 2 and 5.Cross validation for this model resulted in a cross validation errorof 0.4083 or 71 samples of the 120 samples present in the trainingset correctly classified. From the misclassified samples six of the53 legal samples were classified in one of the illegal classes and38 illegal samples were classified as legal. These results showedalready that the model is not suited for purpose. When evaluat-ing with the external test set it was shown that all legal sampleswere correctly classified but eight illegal samples (1 of class 1, 4 ofclass 2, 1 of class 3 and 2 of class 4) were classified as legal. A totalccr for the external test set of 0.7333 or 22 of the 30 samples wasobtained.Repeating the analysis for the two class classification problemresulted in an optimal model using four principal components forclass 1 and three for class 2. The model showed a cross validationerror of 0.3000 or 84 samples correctly classified. From the misclas-sified samples 15 illegal samples (class 1) were classified as legal,while 21 legal samples were classified as illegal. Performing theevaluation with the external test set showed a ccr of 0.7667. This