To better utilize the available cores in the GPU, we propose
the GPU-CC architecture, which allows the cores in an
SM to be configured in a network with direct communication,
creating a spatial computing architecture. By moving
data directly from one core to the next, data movement and
control is made implicit in the network and instruction count
can be reduced. Furthermore, each core is assigned one fixed
instruction which it will execute during the whole kernel execution
time. It is stored in a local configuration register
and has to be loaded only once.