Cycles in the ER (Entity Relationship) model - CONCEPT, VOCABULARY, DOMAIN as a graph

Hi,

A conversation about cycles in the CDM started in the CommonDataModel repos expanded beyond a mere issue report.

In a nutshell, I understand CONCEPT, VOCABULARY, DOMAIN to model data often stored in a graph when it has to be computed on. The current CDM definition implement what I believe to be be optimization/de-normalization approaches, but they create cycles of keys/foreign keys which can cause problems or create additional business logic burden.

Recent SQL additions such as recursive expressions make those optimizations likely no longer beneficial and make the schema more robust when cycles are removed. Would this be something to consider for the next revision of the CDM?

The thread at the origin of this post is here: Cycles in the ER (Entity Relationship) model · Issue #780 · OHDSI/CommonDataModel · GitHub

Best,

Laurent

@lg-ohdsi The vocabulary, domain and concept_class tables indeed refer to the concept table, and the concept table refer to each of the three tables, creating a circular relationship. This can create issues when loading the vocabularies, as the foreign keys cannot all hold at once. This is typically solved by first inserting the records in all vocabulary tables, and applying the foreign key constraints later. Another approach is to use sql transactions.

That said, if for your purpose the circular dependency is still an issue, you could choose to leave the respective fields empty; i.e. vocabulary.vocabulary_concept_id, domain.domain_concept_id and concept_class.concept_class_concept_id. Officially this makes your CDM non-conformant, so try at your own risk, but I would not expect issues with the current OHDSI tools.

Thanks. The initial effort was to generate ORM code from the CDM definitions in the CSV files, assuming that the CSV is the single source of truth – I am trying to confirm whether the case.

I am aware of workaround for handling circularity (toggling constrains off, transactions), including with the ORM (use_alter, post_update), but I am wondering whether that circularity is necessary or optimal. My understanding is that conceptually, ontologies (graphs captured in vocabulary, domain, concept) are present as annotation/metadata for fields in other tables / more classical tabular data.

The presence circularity made SQLAlchemy wince a bit out of the gate (since circularity tweaks must be manually added). This is how I realized it was there. Going through the CSV to SQLAlchemy translation also made me notice few missing foreign key. I think that I created items for all of them in the CommonDataModel repos.

Overall, the quirks create additional burden on a user where some the consistency checks could be ensured by the RDBM (at least in theory – RDBM are not consistent, SQLite will not check foreign keys IIRC for example) or by the ORM.

Leaving fields empty where not in the intent in the CDM definition, making the variant non-conformant, seems somewhat risky.