Dynamic Cache Contention Detection in Multi-threaded Applications
In this paper, we present a novel approach that efficiently ana- lyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention de- pends on factors including the thread-to-core binding and param- eters of the memory hierarchy, the amount of data sharing is pri- marily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we im- plemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hi- erarchy. The runtime overhead of our approach — a 5× slowdown on average relative to native execution — is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an appli- cation, the correlation among its threads, and the sources of signif- icant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12×. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.