GPU memory referencing behavior differs from that of CPUs. For various benchmarks, Figure 4 presents opera-tions per thousand cycles for scratchpad memory lane instructions (left bar, top, blue), pre-coalescer global memory lane instructions (left bar, bottom, green), and post-coalescer global memory accesses (right bar, brown). The “average” bars represent the statistic if each workload was run sequentially, one after the other.