About the same time, there was a lot of work in the community on optimizing the latency of on-chip caches using novel last-level cache organizations. Although the last-level caches at that time were the L2 caches, the proposed techniques are typically applicable to large on-chip caches (e.g., L3 cache in today’s server processors). The NUCA cache work was an early proposal in the direction [2]. Another line of work observed that running parallel programs, or multiple programs, on the shared resources of a CMP introduced new problems due to interference in the shared resources. A significant body of work was already underway in trying to alleviate the negative impact of such interference