Figure 2.1: Illustration of the trade-off between bias and variance as a function of training
time.
To understand these concepts further, consider our unknown function φ, a sample training set t, and two estimators, C and S. The C estimator is so named because it is complicated
— it has a large number of configurable parameters w. The S estimator is so named because
it is simple, it has relatively few parameters w. Assume now that t is outside the representable function space of S, but within the corresponding space of C. We know that C will
be able to fit the function, but also that it will probably overfit to the training dataset t,
and not be a good estimate of the true function φ. For a different random sample of training
data, the complex estimator may again overfit, so producing a different hypothesis for each
different set of training data presented. This is high variance, with regard to the training
sets, but because it fits each one so well, it is said to have low bias. The simple estimator S
will not be able to fit the data t, so will have high bias. However, it will respond in almost
the same way (low variance) for different instances of t, since its representable function
Figure 2.1: Illustration of the trade-off between bias and variance as a function of trainingtime.To understand these concepts further, consider our unknown function φ, a sample training set t, and two estimators, C and S. The C estimator is so named because it is complicated— it has a large number of configurable parameters w. The S estimator is so named becauseit is simple, it has relatively few parameters w. Assume now that t is outside the representable function space of S, but within the corresponding space of C. We know that C willbe able to fit the function, but also that it will probably overfit to the training dataset t,and not be a good estimate of the true function φ. For a different random sample of trainingdata, the complex estimator may again overfit, so producing a different hypothesis for eachdifferent set of training data presented. This is high variance, with regard to the trainingsets, but because it fits each one so well, it is said to have low bias. The simple estimator Swill not be able to fit the data t, so will have high bias. However, it will respond in almostthe same way (low variance) for different instances of t, since its representable function
การแปล กรุณารอสักครู่..
