TLP as a Runtime Metric: The general cache insensitivity of GPU applications stems from two main reasons: i) streaming memory access behavior; and ii) high levels of available TLP. Even when the memory access behavior is not streaming, GPU applications are able to tolerate higher memory access latency using the available TLP. Figure 3 shows the impact of bypassing 1 the shared LLC for 100%, 75%, 50%, and 25% of GPU memory access requests. We observe that, on
an average, GPU applications can sustain up to 75% of LLC access bypassing without significant performance degradation. The GPU is able to do so by utilizing its high degree of TLP.