A closer observation of the Bellman–Ford algorithm and our architectural scheme would reveal that the theoretical maximum level of parallelism in the algorithm has not been fully exploited. This is because all the PEs make use of a single shared bus such that it takes n clock cycles for all the n PEs to broadcast their cost during each iteration. A hypothetical interconnection topology where each PE has a dedicated bus to broadcast data would require just a single clock cycle for all the PEs to broadcast their cost. While such a topology is expensive and infeasible especially for large designs, it is worth exploring other feasible alternatives to the single-bus topology such that multiple PEs can transmit data simultaneously without causing bus contention problems. We believe that the performance of the proposed architecture can be further improved by adopting an interconnection topology that permits parallel data transfer among the PEs.