As shown in Figure 1, a single node of the hybrid HPC system
will consist of several complex out-of-order issue RISC/CISC
multicore processors and Reconfigurable Logic (RL) coprocessors.
These coprocessors will be socket compatible to processors and
hence will be integrated on existing motherboards without any
glue logic. The processors and coprocessors will be interconnected
through uniform chip-to-chip and board-to-board interconnects like
Hypertransport [8]. To ensure scale-up as well as speed-up, it is
quite likely that the most prevalent memory architectures of a single
node in these hybrid computing machines will be cache-coherent
Non-Uniform Memory Access (ccNUMA) [9]. The machines will
have multiple levels of caches and main memory sizes of several
gigabytes, if not terabytes