Detecting conflicts between thousands of concurrent transactions efficiently is challenging. Naive broadcast-based detection scales poorly. Many proposed TMs use global metadata, such as a cache coherence directory, to eliminate unnecessary traffic. GPUs such as Fermi do not have a coherent, private cache for each thread.5 Signaturebased HTMs can operate independently of caches.9 We experimented with an ideal version of a signature-based HTM and found that storing a signature for each thread
requires 3.8 Mbytes of total storage to achieve a reasonably low false conflict rate. Typical conflict detection used in HTMs checks the existence of conflicts and identifies the specific conflicting transactions. Many software TMs, such as RingSTM,11 detect only the existence of conflicts between a committing transaction and transactions that have already committed. Kilo TM uses value-based conflict detection to exploit this insight.12 It detects conflicts without using any global metadata or cache coherence protocol; only values from global memory are used. Each transaction stores the value of each global memory read in its read log (in addressvalue pairs) during execution. Upon its completion, the transaction performs validation by comparing the saved values of its read set against the latest values in memory. A changed value indicates a conflict with one or more committed transactions. Transactions with detected conflicts can self-abort without interfering with other running transactions (shown in Figure 5a). Unlike atomic compare-and-swap (CAS) operations used in nonblocking algorithms, value-based conflict detection can tolerate the ABA problem (see the ‘‘Correctness Discussion’’ sidebar).13 Each transaction normally validates only once before it commits. A transaction is doomed if it has observed an inconsistent view of memory (for example, if between two memory reads, another transaction has committed and updated the accessed locations). These doomed transactions could enter an infinite loop. To ensure that doomed transactions are eventually aborted, we use a watchdog timer to trigger a validation. This satisfies opacity with minimum overhead for GPUs.