There are a couple of ways of implementing CWV. One is to add a separate cache-like structure for CWV. The other is to embed CWV in caches as extra bits as shown in Figure 5. A separate structure requires a large tag overhead because actual modification information for 64B cache line data can be represented with only 8 or 16 bits. Therefore, we choose the second way to minimize energy and area overheads incurred by CWV. Note that, CWV does not incur data coherency problems in a multi-core environment because it updates all actually modified data to DRAMs.