Apologies for starting this conversation and going silent shortly after. I was put out of commission with pneumonia and have been playing catch up since. The source of the infection is still unknown but the leading theory is too much time on the OHDSI forums.
I’ve recently had the opportunity to devote some time to this and as a result believe this is more relevant than I had previously thought, especially with the revival of THEMIS.
This isn’t exactly what I sat down to write but its what came out and at worst hope can make the relevance and potential opportunity more vivid. It is worth noting that I am likely biased, given a solution to this problem enables the oncology data jigsaw puzzle to more or less fall into place, but even yet I believe there is substance here.
I’d like to emphasize that the following is specifically regarding relationships between evidence that are persisted explicitly in the source data and not subjective associations.
I’ve provided a link to a slide deck (can’t attach ppt) that is more of a thought experiment of how this could work rather than a proposal for how it should work. That is likely best to be viewed last (if you make it that far) and there is a duplicate link towards the end.
Rabbit hole preamble (bear with me)
- There is an influx of new data coming into OMOP at a seemingly accelerating rate. More sites, more sources, more types of sources, more variability in the detail and structure of sources. Both in breadth and in depth, but focusing on the latter
- As the depth of data increases we are crossing over the limit of the current scope of adequate OMOP conventions. By adequate I mean to say that for a given piece of evidence in the source data, there is a singular standard target representation defined in OMOP. Inadequate could be defined as either a) no convention for that evidence or b) more than one possible standard convention, or representation in OMOP, for that evidence
- When a site faces a gap in conventions it’s likely one of three outcomes: 1) they give up 2) they implement an ad hoc solution outside of established conventions or 3) they work with the community to create a new convention
- Interoperability, and specifically the feasibility of network research, depends on adequate conventions
- There are “general conventions” in OMOP that are foundational ( _TYPE_CONCEPT, _CONCEPT_ID & _SOURCE_CONCEPT_ID etc.). They define underlying patterns for expanding conventions in a standardized, systematic way. For example, any new table will have provenance defined by _TYPE_CONCEPT and the standard concept by _CONCEPT_ID, etc.
- The FACT_RELATIONSHIP table in its current form is insufficient to facilitate a “foundational” convention, but what if it was? What if there were a mechanism for defining relationships between tables that had the same extensibility as the other “general conventions”? Hypothetically, the most general implication would be that any use case that requires novel relations between tables would either be a) defined within the scope of this foundational convention, or b) the foundational convention would define the pattern, the standardized extension, in which the new convention is created
Why isn’t the FACT_RELATIONSHIP mechanism currently sufficient? What is it missing?
- Provenance - Where did the evidence of this relationship come from?
- Content - What type of relationship is it?
- The field exists (RELATIONSHIP_CONCEPT_ID) but a sufficient vocabulary does not
- Temporality - Is the relationship limited to a period of time? If so, what is it?
Why does it matter?
- There are an expanding number of valid, likely impactful, use cases in which the relationships between entities, as it exists in the source data, cannot sufficiently be represented in OMOP within standardized conventions
- These use cases are either being implemented in an ad hoc approach outside of conventions, or are being implemented by creating new conventions that each require modifications to the CDM
- If we think of the spectrum of data in OMOP as simply entities and relationships between them, and if we had a foundational mechanism that handled relationships, only the use cases that create new entities (e.g. specimen, image, etc.) or modify existing entities would require changes to the CDM
- Greater stability of the CDM facilitates interoperability and eases burden of tooling ecosystem (less versioning to accommodate)
- Slides link: OMOP table relation mock - Google Slides (can’t attach ppt I guess?)
- As mentioned above, the slides are more of a thought experiment as to how this could work instead of a proposal for how it should
- There are a few question marks in there - most notably the slides reference field_concept_id, as other conventions have leveraged, but it is unclear as to whether domain_concept_id is more appropriate
Regarding the inevitable “use cases??”, see the top of this thread for some examples. If helpful I can try to curate the extent of what I’ve come across thus far into a list