Modern graphics processing units (GPUs) include hardware-
controlled caches to reduce bandwidth requirements and en-
ergy consumption. However, current GPU cache hierarchies
are inecient for general purpose GPU (GPGPU) comput-
ing. GPGPU workloads tend to include data structures
that would not t in any reasonably sized caches, leading
to very low cache hit rates. This problem is exacerbated by
the design of current GPUs, which share small caches be-
tween many threads. Caching these streaming data struc-
tures needlessly burns power while evicting data that may
otherwise t into the cache.
We propose a GPU cache management technique to im-
prove the eciency of small GPU caches while further re-
ducing their power consumption. It adaptively bypasses the
GPU cache for blocks that are unlikely to be referenced again
before being evicted. This technique saves energy by avoid-
ing needless insertions and evictions while avoiding cache
pollution, resulting in better performance. We show that,
with a 16KB L1 data cache, dynamic bypassing achieves sim-
ilar performance to a double-sized L1 cache while reducing
energy consumption by 25% and power by 18%.
The technique is especially interesting for programs that
do not use programmer-managed scratchpad memories. We
give a case study to demonstrate the ineciency of current
GPU caches compared to programmer-managed scratchpad
memories and show the extent to which cache bypassing can
make up for the potential performance loss where the eort
to program scratchpad memories is impractica