The lock-step execution model of GPU requires a warp to have the
data blocks for all its threads before execution. However, there is
a lack of salient cache mechanisms that can recognize the need of
managing GPU cache blocks at the warp level for increasing the
number of warps ready for execution. In addition, warp scheduling
is very important for GPU-specific cache management to reduce
both intra- and inter-warp conflicts and maximize data locality.
In this paper, we propose a Divergence-Aware Cache (Da-
Cache) management that can orchestrate L1D cache management
and warp scheduling together for GPGPUs. In DaCache, the insertion
position of an incoming data block depends on the fetching
warp’s scheduling priority. Blocks of warps with lower priorities
are inserted closer to the LRU position of the LRU-chain so
that they have shorter lifetime in cache. This fine-grained insertion
policy is extended to prioritize coherent loads over divergent
loads so that coherent loads are less vulnerable to both inter- and
intra-warp thrashing. DaCache also adopts a constrained replacement
policy with L1D bypassing to sustain a good supply of Fully
Cached Warps (FCW), along with a dynamic mechanism to adjust
FCW during runtime. Our experiments demonstrate that DaCache
achieves 40.4% performance improvement over the baseline GPU
and outperforms two state-of-the-art thrashing-resistant techniques
RRIP and DIP by 40% and 24.9%, respectively.