first of all I’d like to say, that we’ve not yet implemented the whole CDM, however in order to be able to work with real data down the line I know that we will need a detailed evaluation with our Data-Privacy Officers at our Institution. While the CDM does pretty well in terms of minimizing identifiable information, I’m pretty sure we’ll need to further pseudonomize datasets during the ETL process.
One concept that we have done in other (smaller) projects includes the skewing of all Timestamps while maintaining their spacial relationship (see pseudocode below).
for each patient:
factor_days = random(-180,180)
for each timestamp of patient:
timestamp = timestamp + factor_days
My Questing is this: Are there any best-practises for doing these kind of transformations within the ETL for the OMOP CDM? Possibly with the addition of a lookup-table to store the randomized values for each patientid?
Thanks for your time!