There was a discussion about de-identification.
I mentioned a threshold of 20 thousand for ZIP code.
“Some convention for the value of k in k-anonymity”.
Here is an extract from a AMIA conference paper
This masking phenomenon is well described by the concept of k-anonymity
(each dataset record is indistinguishable from at least k-1 other
records given a group of identifying attributes) and the concept of
l-diversity (each group of identifying attributes is immune to
probabilistic inference attack and has at least l well represented
values). There is no clear established boundary; however, the HIPAA ZIP code
rules offer one potential precedence: HIPAA zip code rule (45 CFR
164.514) permits revealing 3-digit ZIP codes as long as the 3-digit ZIP
code covers an area populated by more than 20,000 people, as this is
considered to be sufficient “masking” of the individual. The masking
principle is important in redacting or preserving a sentence, such as,
“Patient has a 9-year-old daughter” in a document that otherwise
contains unmodified dates and locations, but does not contain the
primary patient name.
This website: http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#zip
contains the ZIP code rule in detail:
Covered entities may include the first three digits of the ZIP code if,
according to the current publicly available data from the Bureau of the
Census: (1) The geographic unit formed by combining all ZIP codes with
the same three initial digits contains more than 20,000 people;