writes. As discussed in Section II, for the sequential workload,
the amount of written data is inevitably increased due to
the large chunk size for differentiating the aging rates of
SSDs in Diff-RAID. Instead of using a large-sized chunk, we
can efficiently replace the full stripe writes to partial stripe
writes by removing duplicated data in the full stripe writes.
Figure 5 shows how the deduplication is combined with RAID
to increase the ratio of partial stripe writes. In Figure 5,
the deduplication stage is added to the RAID controller so
that we can find duplicated data across SSDs in the RAID-
5 array. When RAID controller receives a write request, it
computes fingerprint of each page using a collision-resistant
hash function. The fingerprint computation can be supported
by the hash instruction (e.g., Intel SHA Extensions) to decrease
the overhead of hash function. After fingerprinting, each fingerprint
is looked up in the dedup table which maintains the
fingerprints of written data to SSD. Each entry of the dedup
table is composed of a key-value pair, {fingerprint, location},
where the location indicates a SSD number and address of
written data. If the same fingerprint is found, it is not necessary
to write data. Instead, the mapping table is updated so that
the corresponding mapping entry points to the location of
previously written data. If there is no matched fingerprint in
the dedup table, the new fingerprint is inserted into the dedup
table with its location.