OHDSI Home | Forums | Wiki | Github

Family history extension model

 Introduce

  1. Previous research
  • All of us data’s Family History.
  • Convert FH of disease and person to observation_concept_id separately
    disadvantage : multiple FH of disease & person data → difficultin accurately connecting disease and person
  1. Aim of this study
  • Comparison of Conventional mapping method and New mapping method in single center medical check-up’s family history survey data conversion.

new method : FH of disease → observation_concept_id
in person → qualifier_concept_id

 Conculusion

  1. New FH mapping method is possible to minimize ‘in person’ information loss → more accurate
  • New method showed 0 % of ‘in person’ information loss
  • Conventional method result in 100% information loss and show only 0.4% of uphill(broad) mapping is possible
  1. Conventional method is complex & Labor intensive
  • Conventional method is need to multiple search process for proper concept_id

family history extension model_v1.pdf (1.8 MB)
family history reference set_v1.xlsx (142.8 KB)
family_histrory_extension_model_observation sample data_v1.xlsx (13.7 KB)

Hello @Kang_Mira ,

I lead the newly-revived Themis work group, which is a sub-work group of the CDM WG. Themis’s role within the CDM WG is to define conventions on how the data are stored in the CDM. Once defined, we will post the conventions. The Themis WG will have a kick off meeting after the new year, date TBD. At that time, we will review the process for submitting, reviewing and ratifying conventions, along with a meeting schedule, and general logistics. Stay tuned!

Currently, we do not have any conventions on how to store family the data in the CDM. This is something Themis would like to define with conventions. I’ll reach out as we move forward with Themis.


I have been collecting examples of where post-coordination cannot be reasonably maintained in the CDM. This seems to be another example where having an entry term that includes in person information gets lost. When looking at IMO’s terms which are used widely within the english-speaking EHR world, the maps from a term like “family history of alcoholism in sister” are mapped to “Family history of alcoholism (situation)” and to “Family history with explicit context pertaining to sister (situation)”. Linking these two two together inside the CDM would appear differently than keeping the person as a person. Have you looked at how this data would come out of the EHRs using IMO (90% of all EHRs in the US)?


What is the source of the family history data? Are they from IMO or ICD codes? Or do they originate from a free text or string text field?

Hi @Kang_Mira!
You’ve done tremendous work, thanks for the proposal.

At the same, there’s another recent proposal prepared by Marcel de Wilde and @Eduard_Korchmar.

And actually, we had a sort of convention. Not properly ratified, but the vocabularies are built exactly like that at the moment.

The key difference is whether we pre-coordinate, what, and how.

For the personal history, the decision was made and implemented some time ago in the vocabulary releases v20220510 and v20220829_major. But it’s true that we’re still missing the convention article.

For the family history, we can stick with the same model because:

  • It’s the same model and as Eddy/Marcel pointed out “we prefer generic approaches so we can also write generic algorithms on this”.
  • We avoid post-coordination of CDM records => no ugly fact_relationship, external keys or other unclear and slow heuristic.
  • All conditions are allowed to be the values and represent the family history => no need to maintain this part and make arbitrary decisions on what has a genetic component, or not.
  • The concepts that represent a family history in the actual relatives would be organized in the hierarchy that supports standardized analytics. If source data is not specific enough you’d map uphill to the “family history” top dog which would also be used if you don’t care about the level of relationship in your studies. To make this hierarchy rich and nice we’d recreate it using the SNOMED’s persons representation that supports the degrees and all the levels/details needed. I’d actually make it simpler even though organizing it into the hierarchy would resolve its massiveness. Such thing as the time context could be also addressed for some generic concepts if we’ll create the concepts like “FH of the first degree relative less than 50 years of age”. Even if we would add 5-10 permutations with different life-span periods for each concept in the hierarchy (not sure it’s needed), we’re still within the reasonable amount of concepts.

The only concern in this approach is the effort needed to compile a new hierarchy and handle an old one with the respective mappings from the existing source and Standard concepts.

1 Like

Would be nice if IMO would become a proprietary, OHDSI supported vocabulary considering 90% of those of us with EHR data have these codes :slight_smile:

Currently, we have to use the IMO to SNOMED or the IMO to ICD mappings within our EHR. And these can be one IMO to very many SNOMED/ICD codes.

I haven’t seen any information/research on the topic of granularity loss when going from IMO to SNOMED or IMO to ICD. Do you know of any research?

@MPhilofsky @Andy_Kanter
PEDSnet studied this exact issue
Burrows et al. - Standardizing Clinical Diagnoses Evaluating Alter.pdf (536.4 KB)

Burrows EK, Razzaghi H, Utidjian L, Bailey LC. Standardizing Clinical Diagnoses: Evaluating Alternate Terminology Selection. AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:71-79. PMID: 32477625; PMCID: PMC7233070.

1 Like

Now we know why IMO keeps it private. :slight_smile: (Sorry, @Andy_Kanter, couldn’t help the snipe. You tried several times to overcome this.)

The SNOMED CT vocabulary is updated monthly and many SNOMED concepts are inactivated or replaced with other concepts.
Although we suggested “sister of subject(person)” for sister in 2022, this concept has recently been inactivated and replaced with “Sister (person)”. Therefore, our team plan to use “Sister (person)”.instead of “sister of subject(person)”.
There is no one correct answer in policy. It doesn’t matter whether you use “Sister (person)” or “Family history with explicit context pertaining to sister (situation)”
The activity of SNOMED concepts can change any time and mapping policies are dependent on individual institutes , we always have to consider this condition and include all possible concepts in mind.

When you make your cohort definition in Atlas and add Qualifier Criteria in observation, if you search “sister”, then you get the results including “sister” context. You may include as many items with “sister” as you want. Thus you can include those concepts meaning sister as attributes : “Family history with explicit context pertaining to sister (situation)” , " Sister (person)", “FH: Sister”, ect.

IMO vocabulary is private and very popular in US. In Korea, all medical institutes have to use KCD for conditions (diagnosis), a Korean version of ICD, however, there is no government guideline for situation or finding concepts.
In Samsung Medical Center, we have our own vocabulary and we made a mapping relationship for family history between our data and SNOMED CT"

I think IMO has a plenty of concepts and it is a very nice vocabulary.
Unfortunately, IMO products are not public. You have to pay for IMOs. In contrary, ICD is public and SNOMED CT can be free in some conditions where your country has a national contract with International Health Terminology Standards Development Organisation. That would be the major reason why CDM adopted SNOMED/ICD.


The work you have done on modeling family history is great! In order to move this idea to fruition, we need to put it through the Themis WG. Would you be able to create a GitHub issue here? Then the Themis WG will prioritize the issue and invite you to discuss the issue with family history data as it is now, your study results, and your suggestion on how to improve family history data modeling in the CDM.

Tagging @Alexdavv @mdewilde @Eduard_Korchmar @Christian_Reich seems you all have an interest

I think we would like to see how we can solve the post-coordination or interface terminology problem generically. IMO users would be able to use IMO in their implementations and others would be able to use their own level of specificity. The CIEL terminology designed for LMICs is one example that is open. Having data stored at the highest level of specificity but queryable/analyzable at the lowest common level of specificity (using SNOMED, for example) would still work for global network studies.