Why do Ensembles perform better than Single Networks?
3.1.1 In a Regression Context
First, as an illustrative scenario, consider a single neural network approximating a sine
wave; our network has been supplied with a limited set of datapoints to train on, the inputs
chosen randomly at uniform from [ −π, π], and a small amount of Gaussian noise added to
the outputs. Now, consider a single testing datapoint, to find the value of sin(2). The true
answer is ∼ 0.909, yet we know our network may possibly overpredict or underpredict that
value. The way in which it makes errors will follow a distribution dependent on the random
training data sample it received, and also on the random initialisation of the weights. The
mean of this distribution is the expectation value E {f}, and f is a network trained with
a particular dataset and a particular weight initialisation. Figure 3.1 illustrates a typical