Following Frean and Robins [1997] in this paper we will
initially present the catastrophic forgetting effect in a
different way, focusing directly on the the function learned
by the network. To do this we use a simple architecture
consisting of a single input unit, 20 hidden units, and a single
output unit (with a learning constant of 0.05, a momentum of
0.9, and an error criterion of 0.001). The initial population
consists of 6 items (input value / output value pairs), which
can be plotted as data points in two dimensions. After
training we can plot the function learned by the network by
systematically sampling the space of possible input values.
As expected this function passes through the population data
points, see Fig. 1(a)1. To illustrate catastrophic forgetting
we now train the network on a single new item. Fig. 1(b)
shows the resulting function, which correctly fits the new
item data point. Notice however that this function no longer
fits the initial data points at all (the original population inputs
will not generate the correct outputs). After learning just a
single new item catastrophic forgetting has occurred.