The ideal approach to building and evaluating tree-ring models
would be a form of best-subsets regression, where every permutation
of ‘n’ predictors from the pool of ‘N’ tree-ring chronologies
is evaluated for its explanatory power. However, the number of
potential models, N!/(N n)!, can be extremely large. We developed
an alternative, and computationally less demanding
approach, by repeating the forward selection process, recursively
removing predictors from the available pool of size ‘N’, while
simultaneously retaining all models that exceed a threshold R2
adj.
The input data structure can be viewed as a tree (Fig. B.1). At the
root is the single model (with five predictors) found by conventional
forward selection. Traveling down the branches, each predictor
is added to a model and then removed from the pool, and
in turn becomes input (by forward selection) for the next generation
of models – its ‘offspring’. Each step produces a single new
model of ‘n’ predictors (in this case 5) from a pool that excludes
the current predictor and all of its ‘parents’.
Note that a standard recursive approach would progress down
the branches until some arbitrary ‘base case’, N = n for example,
before returning up the tree and completing the next branch. However,
this would require that 1/n of the entire process is completed
before models originating from the second predictor of the initial
(forward selection) model are even considered.We instead wanted
to uniformly diverge from the initial model over time. This way,
under the assumption that the initial model is at least fairly good
(since forward selection evaluates goodness of fit), model quality
should decrease with time. Therefore, we instead complete one
entire ‘level’ (row of branches) at a time, rather than one branch
at a time. The script can be stopped by the user at an arbitrary
point, while also serving as a best-subsets analysis if run to completion.
An anticipated general (although not distinctive) downward
trend in R2
adj over time (i.e., as the process diverges from
the original forward selection model) is confirmed in Fig. B.2. This
plot, corresponding to the ensemble of streamflow reconstructions
in Fig. 4, gives the relative power of the models and the order in
which they were recursively selected by the script. The individual
models are plotted as open circles. The horizontal axis gives the
rank of the models from the most to least powerful from left to
right. The units on the left and right axes, respectively, are R2
adj
and the order of the models in the selection process. The green
curve shows the rate of decline in model power; 200 models have
a R2
adj above 0.35. The first model entered, at the top of the plot, is
the third most powerful model. This is the single model that would
be constructed by standard forward stepwise regression. The most
powerful model, plotted at the left end of the horizontal axis, was
7th model found by the recursive process. Another high-ranking
model, with an R2
adj of 0.47 was the 58th model found. Given
enough time, the algorithm would evaluate every subset of ‘n’ predictors,
but is more likely to find better models earlier in the process.
Fig. B.2 shows that many models, differing by at least one
predictor, have similar strength, in terms