OHDSI Home | Forums | Wiki | Github

Redacted Data

Any suggestions for how people have chosen to handle redacted data. We have some datasets that in order to meet the criteria for “limited data sets” there are certain demographic variables redacted when the combination of age, sex, race, and ethnicity are rare. For ages, they get converted to ranges (which we plan to simply take the middle number as an approximation). But for others, do you recommend leaving the field blank, unknown, null, using the SNOMED code for “redacted”, etc? Looking for suggestions.

Hi @aeisman,

To preserve the CDM schema we assign NULL to columns that may contain fields that violate the privacy requirements. This is particularly necessary with a lot of the _source columns. For standardize analysis, the _concept_id columns are the ones that are necessary and used by ATLAS and most of the studies.

1 Like

thank you!

t