A specific example may help to clarify the learning mechanism. Suppose that there are 20
patterns, P1, P2, ... P20. Each of these patterns, Pi consists of an input and an output
(“teacher”) association (Ii, Ti). The system will be required to sequentially learn all 20
patterns. In other words, each individual pattern must be learned to criterion — along with
all of the pseudopatterns generated by the final-storage memory — before the system can
begin to learn the subsequent pattern. To learn pattern P1, its input I1 is presented to the
network. Activation flows through both parts of the network but the output from the finalstorage
part is prevented from reaching the teacher nodes by the “real” teacher T1. The
early-processing network then adjusts its weights with the standard backpropagation
algorithm using as the error signal the difference between T1 and the output, O1, of the earlyprocessing
network. Internally created pseudopatterns from the final-storage memory are
now generated and will be learned by the early-processing memory. This is done by
presenting random input to the network, which causes activation to spread through both the
early-processing and the final-storage memories. For “pseudo”-input, unlike for “real” input,
there is no “real” teacher to inhibit the arrival of final-storage activation to the teacher nodes
(i.e., the third layer of the final-storage network). The activation on the teacher nodes is thus
produced by the spreading of activation by the pseudo-input through the final-storage
memory. These teacher nodes then serve to train the early-processing memory