At first glance, sharing safe microdata
seems a straightforward task:
Simply strip unique identifiers like
names, addresses, and tax identification
numbers before releasing the
data. However, anonymizing actions
alone may not suffice when other
readily available variables, such as aggregated
geographic or demographic
data, remain on the file. These quasiidentifiers
can be used to match units
in the released data to other databases.
For example, computer scientist Latanya
Sweeney showed as part of her
Ph.D. thesis that 97 percent of the records
in publicly available voter registration
lists for Cambridge, MA, could
be uniquely identified using birth date
and a nine-digit zip code.