Suppose that you were given a linearly separable training set, such as the one in
Figure 9.3, and were asked to find the optimal hyperplane that separates the data
points. How would you proceed? You would very likely first ask what exactly is
meant by optimal. One might first postulate that optimal means any hyperplane
that separates the positive training data from the negative training data. However,
we must also consider the ultimate goal of any classification algorithm, which is
to generalize well to unseen data. If a classifier can perfectly classify the training
data but completely fails at classifying the test set data, then it is of little value.
This scenario is known as overfitting