We implement our fuzzy fingerprint framework in Python,
including packet collection, shingling, Rabin fingerprinting,
as well as partial disclosure and fingerprint filter extensions.
Our implementation of Rabin fingerprint is based on cyclic
redundancy code (CRC). We use the padding scheme mentioned
in [22] to handle small inputs. In all experiments,
the shingles are in 8-byte, and the fingerprints are in 32-bit
(33-bit irreducible polynomials in Rabin fingerprint). We set
up a networking environment in VirtualBox, and make a
scenario where the sensitive data is leaked from a local
network to the Internet. Multiple users’ hosts (Windows 7)
are put in the local network, which connect to the Internet
via a gateway (Fedora). Multiple servers (HTTP, FTP, etc.)
and an attacker-controlled host are put on the Internet side.
The gateway dumps the network traffic and sends it to a
DLD server/provider (Linux). Using the sensitive-data fingerprints
defined by the users in the local network, the DLD server
performs off-line data-leak detection. The speed aspect of
privacy-preserving data-leak detection is another topic and we
study it in [23].