The growth in using GPUs for nongraphics computation illustrates that programmers are willing to redesign algorithms to exploit GPUs’ high computational efficiency. Enhancing GPU architectures to support robust synchronization mechanisms such as transactional memory will lower the risk of employing GPUs for more complex applications. Some of the insights and innovations gained from this work will likely also apply to multicore CPUs as they scale to support more concurrent threads. Our evaluation with an ideal TM system motivates us to make further improvements to Kilo TM. Future work is also needed to explore performance-tuning techniques for GPU TM applications. Insights gained from these explorations could contribute to a GPU-optimized, intuitive TM interface that doesn’t undermine programmers’ ability to enhance application performance