Redacted Data

aeisman · July 6, 2020, 9:49pm

Any suggestions for how people have chosen to handle redacted data. We have some datasets that in order to meet the criteria for “limited data sets” there are certain demographic variables redacted when the combination of age, sex, race, and ethnicity are rare. For ages, they get converted to ranges (which we plan to simply take the middle number as an approximation). But for others, do you recommend leaving the field blank, unknown, null, using the SNOMED code for “redacted”, etc? Looking for suggestions.

jposada · July 6, 2020, 9:57pm

Hi @aeisman,

To preserve the CDM schema we assign NULL to columns that may contain fields that violate the privacy requirements. This is particularly necessary with a lot of the _source columns. For standardize analysis, the _concept_id columns are the ones that are necessary and used by ATLAS and most of the studies.

aeisman · July 6, 2020, 9:58pm

thank you!