Hi,
OMOP newbie here. I looked around for answers to my questions in various OHDSI sources and this forum, but did not yet quite find what I was looking for. Hence the new topic.
I am trying to understand how to capture high-level information about which source systems contributed data to a given OMOP CDM instance (in a context where multiple source systems are involved). Specifically, I am wondering if it is possible to capture (inside the OMOP CDM itself) answers to the following questions that CDM users might asks:
- Is data from source system X (typically not OMOP itself) included in the OMOP CDM data?
- For which (clinical) time range has data from source system X been included in the OMOP CDM?
- Have all data elements from source system X been mapped to OMOP or are some missing/incomplete? Are some element only available from a certain point in (clinical) time onwards?
- Are there other facts about the data coming from source system X (possibly only for a specific time interval) that I should be aware of as an OMOP CDM user? This could e.g. potentially confusing changes at a specific point in time due to a changes in the code systems used in the source system, or data elements that have been imputed or set to dummy values in certain cases.
As a first step, it would be enough to capture this information as a single text blob for each source system, without linking to specific CDM tables or columns. If I understand correctly, the METADATA
table would be the right place for such information. I see some discussion of the usage in the GitHub issue proposing the METADATA
table,including topics pertinent to my question (3) (link to specific comment). However, it is not quite clear to me how to capture high-level information like the above correctly (e.g. for CDM v5.3 documentation of METADATA
, the “User Guide” column is empty) . Are there guidelines/examples available on this?