OHDSI Home | Forums | Wiki | Github

Fact relationships: Searching for an extensible approach

I think the biggest problem is that we are using relational-style tables to represent graph data. Yes, it can be done, but it creates so many levels of complexity. At the risk of being redundant, we need our vocabulary stored in a graph database.
Perhaps for network studies, use a latter version of Postgres, with the vocabulary stored as a graph structure. I realize this would create issues for SQL Server only users, as it has no good way to represent graphs (XML is never a good idea), but the multi database approach will always give inferior results.

I am attempting to not get on my lack of standards soapbox. (edit: I failed)

1 Like

@Mark I agree with you that a graph database is a more natural fit, but adopting OMOP already requires so many areas of expertise that adding an entirely new database type that is foreign to 99% of the community is a bridge too far (at this point).

@jmethot Valid points.

I was assuming if this got some form of a green light we could create an initial set of relationships based on the use cases we’re currently aware of and then poll the community to see if there are suggested additions. That would at least be a starting point for the vocabulary.

Perhaps your “generic relationships”, i.e. “this domain to this domain” can serve as the top levels of hierarchies, where any more specific relationships (your “extension” relations) that fall under that same “X domain to Y domain” relation is a child concept of the more generic. That way for tooling we can use the “include descendants” if there are broad use cases that don’t care about how the two entities are related, just that they are?

If we are going to add this, can we do this with a unique relationship_id instead of the ‘maps to’, please? This is already a problem in ETL world with all the ‘maps to’ over various domains.

Within FACT_RELATIONSHIP the relationship is indeed already represented as a concept ID - an integer field and foreign key to the concept table


I took your statement earlier a bit too literal, sorry.

@rtmill Thank you for this topic. I don’t think there is one right approach. An important consideration is to look at how each proposal fairs across various contexts, DDL (data definition), ETL (extract transform load), DSL (domain specific languages, like Circe), and one-off SQL queries.

I am partial to the 56 table solution proposed by @jmethot. Naming conventions let us use meta-data to manage DDL and DSL contexts, while at the same time not burdening one-off SQL queries with indirection (at the cost of UNION ALL for combined cases). Would ETLs be easier or harder with EAV vs 56 tables?

@cce Thanks, Clark.

Just to be sure: I was floating either

  1. ~56 standard relationship concepts that represent all “minimally semantic” relationships between domains. (it wouldn’t be exactly 56 because Robert proposes some domain-to-same-domain concepts, and some domain-to-other-domain concepts probably don’t make sense)

  2. Some number of carefully chosen standard relationship concepts with specific semantics that have analytical use cases such as those listed at the beginning of @rtmill’s original post in this thread. The trick is finding the goldilocks specificities that satisfy the analytical needs but are not so narrow that people eventually want thousands of them (that latter situation is what I imagine is making steam escape @Christian_Reich’s ears reading this thread).

Neither of these is EAV but using that metaphor, in these proposals the Es and Vs are fixed (the OMOP CDM domains) and we’re proposing a small initial set of standardized As.

Note that I don’t think @rtmill is proposing that existing (local, non-standard) uses of FACT_RELATIONSHIP would change, but that OHDSI tools would only recognize standard relationship concepts therein. If we wanted to make that explicit we could propose a new DOMAIN_RELATIONSHIP table to house only standard domain relationships.

@jmethot I’m even more supportive of using tables for specific kinds of data driven by concrete use cases. Even 56 generic tables may have multiple ways they are used, hindering their usefulness. Perhaps we should focus our energies on creating a schema/vocabulary module system and the community processes for ensuring that extensions are well designed and integrated. We could have a community owned continuous integration to ensure that we don’t have conflicts, etc. For DDL and DSL contexts, we could drive schema creation and generic querying with meta-data.

Just wanted to connect the @Paul_Nagy presentation about medical imaging discussion where they are proposing using a new table image_feature which allows for linking of observations about an image, such as the size of the mass, etc. Since imaging has a specific instance/series and a location (on the image), things can be linked together over time. This is not generalizable to all fact relationships, but it was interesting.

Also recent discussions about mapping HPO concepts to OMOP raised the specter of more granular concepts than SNOMED (either due to missing SNOMED primitives or post-coordination). This discussion might also be relevant. @mellybelly

Apologies for starting this conversation and going silent shortly after. I was put out of commission with pneumonia and have been playing catch up since. The source of the infection is still unknown but the leading theory is too much time on the OHDSI forums.

I’ve recently had the opportunity to devote some time to this and as a result believe this is more relevant than I had previously thought, especially with the revival of THEMIS.

This isn’t exactly what I sat down to write but its what came out and at worst hope can make the relevance and potential opportunity more vivid. It is worth noting that I am likely biased, given a solution to this problem enables the oncology data jigsaw puzzle to more or less fall into place, but even yet I believe there is substance here.

I’d like to emphasize that the following is specifically regarding relationships between evidence that are persisted explicitly in the source data and not subjective associations.

I’ve provided a link to a slide deck (can’t attach ppt) that is more of a thought experiment of how this could work rather than a proposal for how it should work. That is likely best to be viewed last (if you make it that far) and there is a duplicate link towards the end.

Rabbit hole preamble (bear with me)

  • There is an influx of new data coming into OMOP at a seemingly accelerating rate. More sites, more sources, more types of sources, more variability in the detail and structure of sources. Both in breadth and in depth, but focusing on the latter
  • As the depth of data increases we are crossing over the limit of the current scope of adequate OMOP conventions. By adequate I mean to say that for a given piece of evidence in the source data, there is a singular standard target representation defined in OMOP. Inadequate could be defined as either a) no convention for that evidence or b) more than one possible standard convention, or representation in OMOP, for that evidence
  • When a site faces a gap in conventions it’s likely one of three outcomes: 1) they give up 2) they implement an ad hoc solution outside of established conventions or 3) they work with the community to create a new convention
  • Interoperability, and specifically the feasibility of network research, depends on adequate conventions
  • There are “general conventions” in OMOP that are foundational ( _TYPE_CONCEPT, _CONCEPT_ID & _SOURCE_CONCEPT_ID etc.). They define underlying patterns for expanding conventions in a standardized, systematic way. For example, any new table will have provenance defined by _TYPE_CONCEPT and the standard concept by _CONCEPT_ID, etc.
  • The FACT_RELATIONSHIP table in its current form is insufficient to facilitate a “foundational” convention, but what if it was? What if there were a mechanism for defining relationships between tables that had the same extensibility as the other “general conventions”? Hypothetically, the most general implication would be that any use case that requires novel relations between tables would either be a) defined within the scope of this foundational convention, or b) the foundational convention would define the pattern, the standardized extension, in which the new convention is created

Rabbit hole

Why isn’t the FACT_RELATIONSHIP mechanism currently sufficient? What is it missing?

  • Provenance - Where did the evidence of this relationship come from?
  • Content - What type of relationship is it?
    • The field exists (RELATIONSHIP_CONCEPT_ID) but a sufficient vocabulary does not
  • Temporality - Is the relationship limited to a period of time? If so, what is it?

Why does it matter?

  • There are an expanding number of valid, likely impactful, use cases in which the relationships between entities, as it exists in the source data, cannot sufficiently be represented in OMOP within standardized conventions
  • These use cases are either being implemented in an ad hoc approach outside of conventions, or are being implemented by creating new conventions that each require modifications to the CDM
  • If we think of the spectrum of data in OMOP as simply entities and relationships between them, and if we had a foundational mechanism that handled relationships, only the use cases that create new entities (e.g. specimen, image, etc.) or modify existing entities would require changes to the CDM
  • Greater stability of the CDM facilitates interoperability and eases burden of tooling ecosystem (less versioning to accommodate)

Example implementation:

  • Slides link: OMOP table relation mock - Google Slides (can’t attach ppt I guess?)
  • As mentioned above, the slides are more of a thought experiment as to how this could work instead of a proposal for how it should
  • There are a few question marks in there - most notably the slides reference field_concept_id, as other conventions have leveraged, but it is unclear as to whether domain_concept_id is more appropriate

Regarding the inevitable “use cases??”, see the top of this thread for some examples. If helpful I can try to curate the extent of what I’ve come across thus far into a list

1 Like

Thanks @rtmill for your very thoughtful post and for the concreate proposal for a revision to the FACT_RELATIONSHIP table. I think this proposal has merits, and I’d be eager to see you or others apply it to real data as a pilot to demonstrate its value so that we could consider promoting it more broadly.

Hello @rtmill,

You mention Themis, but this will fall under the CDM WG domain since it would most likely require a change to the model. Themis will definitely be available to help give feedback on proposed conventions and then help solidify any conventions needed for these data.

And I agree with @Patrick_Ryan :

The CDM WG is currently hosting presentations on proposed model expansions. You should work with @clairblacketer to get on the schedule!

Thanks @Patrick_Ryan . I’ll plan to give a draft proposal to the CDM group to get feedback before any sort of development/pilot effort.

@MPhilofsky Apologies on the confusion - I only meant that the THEMIS Revival™ was inspiring towards thinking about the underlying conventions more substantially

To append to the above proposal (per CDM group feedback): the field_concept_id references should instead be table_concept_id