5. Discussion and conclusionsTable 1 gives an overview of the performance of the differentmodels in terms of misclassification of test set samples and for thecross validation procedures (if available). When comparing the dif-ferent models the primary criterion is the number of illegal samplesclassified as legal. This number has to be as low as possible, since
illegal samples classified as legal will, when no other indications orsuspicions are available, pass the customs and come on the mar-ket. The classification of legal samples as illegal is less critical, sincesamples found illegal will be blocked at the customs and sampledfor laboratory analysis. Legal samples found illegal by the model,will be recognised as legal after analysis and released afterwards.Of course it is preferable to reduce the number of legal samplesclassified as illegal, in order to limit the number of legal productsblocked at customs.When comparing the results in Table 1 for the test set samplesit can be concluded that the k-NN models perform the best. For the2-class classification in legal/illegal the proposed model classifiednone of the illegal samples in the test set (18 samples) as legal.Though four legal samples were classified as illegal. None of theother models performs better. The RF model classified only threelegal samples as illegal, but also two illegal samples as legal. Asdiscussed before it is preferred to limit as much as possible thenumber of misclassified illegals, despite a more elevated numberof misclassified legals.For the five class classification two models gave suitable results:the k-NN and the PLS-DA model. In the k-NN model only one ille-gal sample was classified as legal, while four legals were classifiedas illegal. Though it has to be mentioned that four illegal sampleswere unclassified. This does not have to be a problem, since it canbe decided that if a product is unclassified the product is seizedand send to a laboratory for analysis. The PLS-DA model performsslightly worse than the k-NN model, but no unclassified samplesoccur. In total two illegal samples were classified as legal and fivelegal samples were classified as illegal. For both models it was alsoshown that for the illegals, the samples were generally attributedto the right illegal component present, since in each model only oneillegal sample of the test set was attributed to the wrong active sub-stance. The latter was generally the case for all models evaluated.The biggest problem is the differentiation between legal and illegalsamples and not the attribution of the samples, classified as ille-gal, to the right component. When examining the misclassificationsin the different models it can even be said that the biggest prob-lem is the differentiation between classes 2 (corticosteroids) andclass 5 (legal samples). This is probably due to the concentrationof the corticosteroids used in these samples. Corticosteroids arepresent in dosages going from 0.0007% to 0.065%, while tretinoinand hydroquinone are often present in much higher dosages [7].Evaluation of the misclassifications of the different models, both incross validation as in external validation revealed indeed that thelow dosed samples containing corticosteroids are confused withthe legal samples and vice versa.Although the models are performing quite well for the predic-tion of the test set it should be mentioned that the performancesare generally lower for the cross validation. This is not surprising,since during cross validation a part of the training set is deleted,in order to evaluate the performance of a model build with theremaining part. This procedure is repeated until all samples werepredicted once as member of a test set. This means that for examplein 10-fold cross validation, the predicted results from the cross vali-dation are based on 10 different models. The fact that the predictive performance of a model is lower in cross validation than with anexternal test set shows that the model is not robust, which can beexplained by the high variability in matrices present in the sampleset. In other words the models are performing well, but have to berendered more robust before being used in routine screenings. Theonly way to do so is adding more samples to the data set.In general it can be concluded that ATR-IR can be a valuabletool for customs to make a first evaluation of samples, suspect ofcontaining illegal whitening agents. The analysis needs no samplepreparation and if a data set of enough samples can be establishedthe interpretation can be fully automated using basic chemometricmodels. The technique can be used to differentiate samples contain-ing illegal whitening agents from the ones which do not and to havea first idea of the illegal component present. Off course the evalua-tion of the suspicious character of packaging and documents shouldalways be taken into account and the results for samples classifiedas illegal should be confirmed by laboratory analysis. Using ATR-IRtechnology, for a first screening of samples, could limit the numberof samples blocked, unnecessarily at the customs.As a final remark it should be mentioned that the models pre-sented here can only be used to detect illegal whitening agents.When one wants to use ATR-IR for the detection of other compo-nents different chemometric tools should be evaluated to modelthe ATR-IR spectra in order to select the most appropriate strategy.The combination of ATR-IR and the optimal chemometric modellingtechnique is dependent on the problem to be solved and can changein function of the components to be detected