One of the most commonly used on-chip communication
architectures is the shared-bus architecture. The main advantage of shared-bus architectures includes simple topology,
low cost, and extensibility. Several companies have developed
their own on-chip bus architectures, such as CoreConnect [1],
AMBA [2], and OpenCore [3]. Modules connected to a bus
are divided into two categories: masters and slaves. Masters
can initiate a bus transaction, while slave modules merely respond to transactions initiated by masters. Since a bus is typically shared by multiple masters, arbitration is required and
a master can access the bus only after it receives an access
grant from the arbiter. Commonly used arbitration methods
include priority-based arbitration [2] and time division multiplexing (TDMA) [3]. Randomized arbitration is also introduced in LOTTERYBUS [4]. As only one module can access the bus at any time, the bandwidth is limited when the
number of modules attached to a bus is large. The bandwidth
can be improved by hierarchical bus architectures [1], in which
multiple buses are connected with each other through bridges.
Studies in [5, 6] propose algorithms to perform bus hierarchy
optimization based on communication profiles. However, hierarchical bus architectures may suffer long latency for interbus
communications, and bridges usually have high area cost due
to their large number of buffers.