The GPU-CC architecture as introduced in Section 3 is
implemented in GPGPU-Sim [1] version 3.1.2. The function
each core executes is shown in Fig. 4a. The 3×3 structure
of the convolution implemented here is visible in this figure.
The FMUL, FMAD and FADD cores perform the multiply and
add operations of the convolution. Two IADD cores are used
for calculating the input and output addresses