6) K-fold cross-validation with stratified sampling:
Finally, K-fold cross-validation with stratified sampling is
used to test the SVM classification model and estimate its
predictive performance. K=10 was chosen for optimal model
estimation from real-world data [8]. The data set is
partitioned into 10 subsets using stratified sampling where
the class distribution in each subset, i.e. proportion of PD to
control data points, is the same as the distribution in the
entire data set. In each of 10 iterations, one subset is retained
as testing data for the model, and the remaining nine subsets
are used to train the model. Performing multiple repetitions
of the cross-validation process checks for overfitting of the
model, where the model fits the training data very well, but
performs poorly in predicting class labels for new data.
Overall performance of the model is obtained by averaging
the performance over the 10 rounds.