OHDSI Home | Forums | Wiki | Github

Masking of Data about a Person

One of the items that came out of the THEMIS F2F was what are the cases one should mask data. Are there conditions/procedures/drugs or other items that should be masked/hidden in the CDM? For example, patients over a certain age should be assigned a new age to protect patient privacy. Here is the recommendation that THEMIS came up with:

RECOMMENDATION
The masking of information related to a person is dependent on the organization's privacy policies and may vary by data asset.

ACTION
Work with CDM WG to add this statement to FAQs.

QUESTION: Are there conditions/procedures/drugs or other domains that should be masked/hidden in the CDM?

ANSWER: The masking of information related to a person is dependent on the organization's privacy policies and may vary by data asset.


@mvanzandt - I think it is better to handle this with one ticket than one for each item. I don’t think the recommendation differs by domain or item we are talking about.

Looks like this one is almost settled. Anyone else have anything to add?

@ericaVoss,

We should add a recommendation to include this information on the new Metadata/Annotation table.

FYI - @Ajit_Londhe I’ll add this to the spreadsheet of examples

1 Like

@mvanzandt has to think about this in her role, she might be able to help with examples. We receive our data already cleansed so don’t have to worry. If that is helpful . . .

@ericaVoss Since many of our data sources are de-identified, we cannot include any information that has a high risk potential of re-identifying the patient. Here are a few examples.

  1. Person’s age. Any patient over the age of 85 is masked in our data. We set anyone over in the data source over 85 set to 85. Some other organizations have this same rule and they set it to 90. PHI rules, I believe has it set to 90.

  2. Diagnosis codes - on the same principle of not being able to re-identify the patients, we do not bring in source records that have are flagged as a privacy diagnosis.

  3. Zip codes - in many of our US data sources, we cannot display all 5 or 9 digits of the zip code. We can only go down to the first 3 digits. In some of our non-US data sources, mask all the way up to regions.

1 Like

I added it to our new concept list proposals.

t