imbalance, or inflate per-core working set sizes which (as threads share TLB and cache capacity, among other re- sources) can transform well-behaving cache-fitting applica- tions into cache-trashing ones making their performance lim- ited by off-chip bandwidth [4].