This example shows that the low latency of the DCM and the concurrency of message paths can produce improvements
that exceed the ten fold improvement in bandwidth alone. The fast Fourier transform (FFT) of a two dimensional array A can be computed as a sequence of 1D FFT’s on the rows of the matrix followed by 1D FFT’s on the columns. The iPSC vector processor can compute 1024 point single precision complex FFT’s at more than 10 megaflops. So a sixteen node iPSC-VX could be expected to compute a 1024x1024 2D FFT at 160 megaflops or in about .7 seconds. Unfortunate1 d1 if the matrix is partitioned to the processors by rows, then it must be transpose before the second set of FFT’s is
computed. This transpose requires that each pair of nodes exchange a 64x64 square.This is a very complicated communication pattern and it took very special coding on the old system to avoid congestion problems. The time to perform the transpose is 5.5 seconds on the iPSC/l yielding a total computation time of 6.2 seconds. This translates
into a performance of approximately 17 Mflops.