this, referred to as chaining, is found on the Cray supercomputers. The basic rule for
chaining is this: A vector operation may start as soon as the first element of the
operand vector(s) is available and the functional unit (e.g., add, subtract, multiply,
divide) is free. Essentially, chaining causes results issuing from one functional unit to
be fed immediately into another functional unit and so on. If vector registers are
used, intermediate results do not have to be stored into memory and can be used
even before the vector operation that created them runs to completion.
For example, when computing where A,B, and Care vec
tors and sis a scalar, the Cray may execute three instructions at once. Elements
fetched for a load immediately enter a pipelined multiplier, the products are sent to
a pipelined adder, and the sums are placed in a vector register as soon as the adder
completes them:
1. Vector load
2. Vector load BSVR2
ASVector Register (VR1)
C=(s *A) +B,
Figure 17.17 Pipelined Processing of Floating-Point Operations