For most of our results in Section 6, we use a cache-sensitive subset of SPECCPU2006 benchmarks. We selected eighteen benchmarks that see more than a 3% slowdown when either the L1 or L3 cache is reduced to a quarter of its original size. The 4-core workload mixes are randomly chosen from these eighteen benchmarks. We have ten homogeneous workloads (running four copies of the same application), and ten heterogeneous workloads (running a mix of four different benchmarks). Table 1 lists the benchmarks included in each of 4-core workloads.