The speedup that can be achieved using registers is demonstrated in Figure
17.20. The FORTRAN routine multiplies vector A by vector B to produce vector
C, where each vector has a real part (AR, BR, CR) and an imaginary part (AI, BI,
CI). The 3090 can perform one main-storage access per processor, or clock, cycle
(either read or write); has registers that can sustain two accesses for reading and
one for writing per cycle; and produces one result per cycle in its arithmetic unit.
Let us assume the use of instructions that can specify two source operands and a
result.
4
Part a of the figure shows that, with memory-to-memory instructions,each
iteration of the computation requires a total of 18 cycles. With a pure register-to