GPUs have evolved to programmable, energy efficient compute
accelerators for massively parallel applications. Still,
compute power is lost in many applications because of cycles
spent on data movement and control instead of computations
on actual data. Additional cycles can be lost as well
on pipeline stalls due to long latency operations.
To improve performance and energy efficiency, we introduce
GPU-CC: a re-configurable GPU architecture with communicating
cores. It is based on a contemporary GPU, which
can still be used as such, but also has the ability to reorganize
the cores of a GPU in a re-configurable network. In
GPU-CC data movement and control is implicit in the con-
figuration of the communication network. Additionally each
core executes a fixed instruction, reducing instruction decode
count and increasing energy efficiency. We show a large
performance potential for GPU-CC, e.g. 1.9× and 2.4× for
a 3×3 and 5×5 convolution application. The hardware cost
of GPU-CC is mainly determined by the buffers in the added
network, which amounts to 12.4% of extra memory space.