What is the convention for generating person_ids between CDM refreshes?

mccullen_j · July 31, 2023, 1:59pm

When you are refreshing your CDM, do you need to ensure that the person_id remains the same? For example, if you have person_id 100 before the refresh, does the same person need to be person_id 100 after the refresh? I’m asking because it would be possible for the person to be something other than 100 if you assigned a simple row_number() for the person_id. I’m not sure if I need my refresh logic to ensure that existing persons retain their person_ids. I think this question may apply to other identifiers too.

Mark · July 31, 2023, 2:39pm

Unless you are doing incremental updates of your instance, no. Most of us do kill and fill, so the data will be linked, always, to the correct person.

DTorok · July 31, 2023, 2:52pm

We often see where the person’s medical identifier is considered PHI and not included in the OMOP tables, but the customer wants to be able to associate the OMOP person Id back to the medical identifier. In these cases we maintain a table that has both the EHR medical identifier and the OMOP person_id. This table is persistent between updates. When doing a refresh, the ETL checks if the EHR medical identifier was seen before, and if so uses the OMOP person id from the table, and if it is the first time for the EHR medical identifier a new OMOP person id is generated. Using this scheme, the OMOP person id will be the same from one ETL to the next.

MPhilofsky · July 31, 2023, 3:51pm

OHDSI does not have a convention for maintaining the same person_id for a person between data refreshes. However, if your institution is part of a project which delivers data more than once to a person or group, then they might require the person_id remains the same.

Also, Mark and Don’s answers are valid. However, what you do with your person_id between data refreshes is dependent on your use case.