One of the most commonly used on-chip communication
architectures is the shared-bus architecture. The main advantage of shared-bus architectures includes simple topology,
low cost, and extensibility. Several companies have developed
their own on-chip bus architectures, such as CoreConnect [1],
AMBA [2], and OpenCore [3]. Modules connected to a bus
are divided into two categories: masters and slaves. Masters
can initiate a bus transaction, while slave modules merely respond to transactions initiated by masters. Since a bus is typically shared by multiple masters, arbitration is required and
a master can access the bus only after it receives an access
grant from the arbiter. Commonly used arbitration methods
include priority-based arbitration [2] and time division multiplexing (TDMA) [3]. Randomized arbitration is also introduced in LOTTERYBUS [4]. As only one module can access the bus at any time, the bandwidth is limited when the
number of modules attached to a bus is large. The bandwidth
can be improved by hierarchical bus architectures [1], in which
multiple buses are connected with each other through bridges.
Studies in [5, 6] propose algorithms to perform bus hierarchy
optimization based on communication profiles. However, hierarchical bus architectures may suffer long latency for inter-bus
communications, and bridges usually have high area cost due
to their large number of buffers.