Thank you for sharing the IRB for the other sites!
For those that want a quick review of the most important part, I pasted the main points below (e.g., re-shifting of dates during each refresh):
We will utilize the following techniques to minimize risk and protect patient confidentiality:
-
Identifier Generation: Anonymized, masked patient identifiers will have already been assigned to each record (i.e., MRNs will not be visible) as part of the initial transfer of data into the CHCO OMOP CDM limited data repository.
When data is transferred into the OMOP CDM de-identified data repository the masked identifiers will be rerandomized and a completely new set of identifiers will be created. Once this transformation and re-randomization
is complete the mapping tables will be destroyed.
-
Date Shifting: Obfuscation will be used to de-identify dates each time data is transferred from the CHCO OMOP CDM limited data repository to the OMOP CDM de-identified data repository. This obfuscation process involves
adding a random number of days to all dates associated with a patient; each patient’s dates are shifted by a different random number of days. A uniform random distribution will be used to generate a random number of days
to shift dates between -60 to +60 days (a 120 day interval). Once the mapping information used to shift the dates is destroyed the process cannot be reversed. Since date obfuscation may potentially impact aggregate numbers (i.e., can cause patients to shift into or out of time intervals), analyses and interpretation of findings will be known to be approximate, which adds an additional layer of patient privacy protection.
-
Mapping procedures: The mappings used to re-randomize patient identifiers and shift patient dates utilized each time data is refreshed and transferred from the CHCO OMOP CDM limited data repository to the OMOP CDM deidentified data repository will be destroyed after the transformations are complete. Thus there will be no viable link that could be used to identify patients included in the OMOP CDM de-identified repository. Further, since the mappings used to transform the data will be destroyed each time new data is incorporated into the repository, each transform will result in a new set of shifted dates and re-randomized identifiers unique to that particular data transfer.
In summary, our approach applies multiple layers of protection; a de-identified OMOP CDM data repository within the CHCO secure computing environment where only de-identified portions of data are exported. Moreover, all identifiers in structured data are masked (i.e., provided randomly assigned identifiers and utilization of only time altered dates) at all times. The combination of these measures ensures several layers of protection for patient data.