In this paper, we present details of our solution and provide
extensive experimental evidences and theoretical analyses to
demonstrate the feasibility and effectiveness of our approach.
Our contributions are summarized as follows.
1) We describe a privacy-preserving data-leak detection
model for preventing inadvertent data leak in network
traffic. Our model supports detection operation delegation
and ISPs can provide data-leak detection as an
add-on service to their customers using our model.
We design, implement, and evaluate an efficient
technique, fuzzy fingerprint, for privacy-preserving
data-leak detection. Fuzzy fingerprints are special sensitive
data digests prepared by the data owner for release
to the DLD provider.
2) We implement our detection system and perform extensive
experimental evaluation on 2.6 GB Enron dataset,
Internet surfing traffic of 20 users, and also 5 simulated
real-world data-leak scenarios to measure its privacy
guarantee, detection rate and efficiency. Our results indicate
high accuracy achieved by our underlying scheme
with very low false positive rate. Our results also show
that the detection accuracy does not degrade much
when only partial (sampled) sensitive-data digests are
used. In addition, we give an empirical analysis of our
fuzzification as well as of the fairness of fingerprint
partial disclosure.