Process one-to-many mapping of OR concept

ivona · September 16, 2024, 2:03pm

Hello everyone,

after reviewing various posts in this forum on one-to-many mappings, I now understand that it is common practise to map a source concept to multiple standard ones if needed. It would be done by creating multiple rows/records in the CDM table, one for each mapping. post 1, post 2, post 3 post 4 , post 5

In these posts, often the question has come up how to “flag” that one concept has been decunstructed into several ones during ETL, i.e. that the multiple concepts in the CDM belong together. The necessity to do that has been questioned when all mappings hold true for the patient, and proposed to use the fact_relationship table, but I could not extract any real recommendation.

In my use case, it is particularly important to flag the connectedness of the concepts: my local custom source concept is “coronary heart disease OR myocardial infarction”, so we can not distinguish whether someone has had myocardial infarction or heart disease. Creating seperate mappings/records for this concept, one to 317576 Coronary arteriosclerosis and one for 4329847 myocardial infarction would thus be severly misleading as that would suggest that everyone with CHD in our dataset had a myocardial infarction.

Several options come to mind:

use fact relationship table as proposed in this post. In this case, e.g., use relationship_concept_id: 4223141 OR (SNOMED)
inform data analysts seperately that those concepts must always be analyzed together, however I feel that this should be the last resort if this problem cannot be handled within the OMOP CDM
map the source concept only to 317576 Coronary arteriosclerosis as a vast portion of myocardial infarctions are caused by CHD, but that loss in meaning must be assessed and documented as well

It seems that there is a guideline concerning this topic in the works in themis,, so I am looking forward to that!

Very grateful for any opinions on this topic!
Best
Ivona

MPhilofsky · September 16, 2024, 8:19pm

Hello @ivona!

Actually, this is just a broken link. It should link to this page

Themis does not have a convention for this particular issue. This is a complex vocabulary issue since your source term could represent two different conditions.

My suggestion would be to map to the less granular concept_id of heart disease, since a myocardial infarction indicates disease of the heart. I would store the exact term in the source_value field. AND create a custom concept_id > 2 billion to represent the exact source term.

Users of your data will need to be informed of this issue because they will need to create a cohort with other attributes (lab test, EKG, etc.) to identify those persons with a myocardial infarction. I am unsure how you would accomplish this for all possible uses of your data. You definitely need to document and highlight this in your business rules.

ivona · September 25, 2024, 7:12am

Thank you Melanie! This truly helped