A lockup-free cache is a common requirement for most latency-hiding techniques, including prefetching, relaxed consistency models, non-blocking loads, and multithreading. The complexity of implementing a lockup-free cache depends on which of these techniques it is intended to support (as described in detail by Laudon [52]). For example, if the goal is simply to support multiple outstanding prefetches, then it is not strictly necessary for the processor to maintain state on outstanding transactions, as long as the cache is prepared to receive prefetch responses from outside the processor while the processor may be simultaneously issuing new requests. In contrast, supporting multiple outstanding stores (as with relaxed consistency models) or loads (if they are non-blocking) does require that special state be maintained for each outstanding access. For stores, the stored data must be merged into the cache line when it returns. For loads, the requested data must be forwarded directly to a register-thus requiring state to associate each outstanding access with the register(s) waiting for the value-and any future uses of that register must interlock if the value has not returned yet.
Kroft [45] presented the original lockup-free cache design, which adds structures called ``miss information/status handling registers'' (MSHRs) to keep track of outstanding misses. Each MSHR contains enough state to handle one or more accesses of any type to a single memory line. Due to the generality of the MSHR mechanism, the amount of state involved is non-trivial, including the address, pointers to the cache entry and destination register, written data, and various other pieces of state. The majority of subsequent lockup-free cache proposals have been a variation of this original MSHR scheme [46][79][70][63]. An alternative approach is to maintain the state of outstanding misses in the cache tag array itself [52][17], thereby permitting a larger number of outstanding misses.