Due to the time dependence between observations the evaluation procedures for
time series prediction models are different from standard methods. The latter
are usually based on resampling strategies (for instance bootstrap or cross vali-
dation), which work by obtaining random samples from the original unordered
data. The use of these methodologies with time series could lead to undesirable
situations like using future observations of the variable for training purposes14
, and evaluating models with past data. In order to avoid these problems we
usually split the available time series data into time windows, obtaining the
models with past data and testing it on subsequent time slices.
The main purpose of any evaluation strategy is to obtain a reliable value of
the expected predictive accuracy of a model. If our estimate is reliable we can
be reasonably confident that the predictive performance of our model will not
deviate a lot from our estimate when we apply the model to new data from the
same domain. In Section 2.7 (page 64) we have seen that the key issue to obtain
reliable estimates is to evaluate the models on a sample independent from the
data used to obtain them.