ime breakdown.
For SGEMM, as the GPU has higher throughput than
the CPU (for 100 percent work, the GPU processes the
kernel computation time 5 faster than the CPU), and as
the data-transfer time takes a small proportion (around
20 percent) of the whole GPU execution time, the best
performance (the minimum Tmax) is achieved with the
configuration of 80 percent work on the GPU and 20 percent
work on the CPU. For SConv, the GPU also has
around 5 higher throughput, but its overall performance