The moral of the story would seem to be that you can’t have it both ways. A
system that develops highly distributed representations will be able to generalize but
will suffer from catastrophic forgetting; conversely, a system that uses completely
local representations may not suffer from catastrophic forgetting, but it will have
little or no ability to generalize. The challenge is to develop systems capable of
producing [semi-distributed] representations that are local enough to reduce, if not
entirely overcome, catastrophic forgetting yet that remain sufficiently distributed to
nonetheless allow generalization