In this paper, we proposed a three-stage buffered Clos-network
switch architecture with the novel batch scheduling
discipline, which can emulate any CICQ switch at a higher
level. Compared with traditional CICQ switches with similar
sizes, TSBCS/BS can greatly reduce the scheduling complexity
and implementation complexity. Using the
distributed shared-memory design, the architecture is made
scalable. We further proposed direct cell-forwarding
schemes to solve the inefficiency of BS under light loads,
which can dramatically improve the delay performance. We
proved that TSBCS/BS with the SQUISH algorithm can
achieve 100 percent throughput under any admissible traf-
fic. Through simulations, we showed that the delay performance
of TSBCS/BS with SQUISH is comparable to the
ideal OQ switches. We also considered the implementation
issues, such as communication overhead and potential starvation,
and then we provided practical solutions to these
issues, which make the work more attractive.