This is the weighted average output of the predictor f on the input x, where each output
is determined after training f with a particular training set t and parameter initalization
w, and weighted according to the probability of that training set and that parameter initialization. The two independent random variables T and W form a joint random variable
TW. Wherever the expectation operator E {·} is used in this thesis, unless explicitly stated,
it can be assumed the expectation is calculated with respect to the distribution of TW.
Similarily to (2.8) we define the expected error (or generalization error) of the predictor
on a single input pair (x, d) as: