3.3. Differentiation of olive varieties based on their
CGE protein profiles
Clear differences among the protein profiles were observed
when comparing the electropherograms obtained for the analyzed olive samples. For a better understanding of these differences, multivariate classification methodologies were applied to
the area percentages obtained from protein profiles. Cluster
analysis did not result in a suitable classification according to
their geographical origin. Therefore, a supervised multivariate
method such as discriminant analysis was chosen to construct
linear discriminant functions to classify olives according to their
geographical origin. For that purpose, the area percentages of the selected seven peaks in the twenty olive varieties studied was
used. The classification factor used was the geographical origin of
every olive sample, using the following four denominations
‘‘North east’’, ‘‘South east’’, ‘‘South west’’, and ‘‘Other countries’’
for those olive varieties with a geographical origin different from
Spain (see Table 1 ). At this point, it is necessary to point out that
olive cultivars from different geographic origins often show
significant variability in their genetic and phenotypic traits. In
this work, the olive varieties were grown under the same
pedoclimatic conditions, avoiding the possible influence of these
conditions on their classification. Fig. 3 shows the distribution of
olive varieties in the plane defined by the two first discriminating
functions comprising the mathematical model. A clear classification according to the geographical origin was achieved demonstrating
that the variety origin could be a suitable classification factor [21].
Indeed, four different groups were observed and two discriminating
functions with P-values lower than 0.05 were statistically significant
at the 95% confidence level. The model enabled the correct classification of 16 of the 20 olive samples (80% of prediction capability).
For the evaluation of this model, a cross-validation procedure was
performed by the treatment of n-1 out of n observations as training
dataset to determine the discrimination rule and to classify the
observation left out observing a 78.5% of correct classification.