OHDSI Home | Forums | Wiki | Github

Why is the person table the only table with row level provenance tracking?

The person.person_source_value column makes the person table the only table that supports a kind of row level provenance tracking. From the documentation

'An (encrypted) key derived from the person identifier in the source data. This is necessary when a use case requires a link back to the person data at the source dataset."

I would like to be able to have row level provenance tracking available on all the standardized clinical tables. My idea is the following:

  • Add cdm_source_id to cdm_source
  • Add cdm_source_id to every to every standardized clinical table.
  • Add X_row_source_value on every standardized clinical table.

This would help the CDM support row level provenance tracking across a CDM spanning multiple source systems. It would also help aid anybody trying to do incremental loads of a CDM. The combination of X_row_source_value and cdm_source_id could be a stable identifier that would allow one to not to have to resort to truncate and load.

@mgurley:

Want to take on this proposal and run with it?

What’s that for?

t