A hypothetical interconnection topology where each PE has a dedicated bus to broadcast data would require just a single clock cycle for all the PEs to broadcast their cost. While such a topology is expensive and infeasible especially for large designs, it is worth exploring other feasible alternatives to the single-bus topology such that multiple PEs can transmit data simultaneously without causing bus contention problems. We believe that the performance of the proposed architecture can be further improved by adopting an interconnection topology that permits parallel data transfer among the PEs.