Vocabulathon 2025: Precise mapping

This is a tread of subgroup of the Vocabulathon

Let’s prepare before the Vocabulathon, doing some work offline, so then we’ll get a productive 4 hours in the meeting.
If you’re interested in solving this problem, please answer

  1. Please outline the use cases and problems with non-precise mappings.
    How it affects your phenotypes and research in general.
    Which vocabularies are you working with?

  2. If you have ideas how to solve this problem, please share them

  3. How can you participate in the discussion (only online, or also in-person during the Symposium)?

In the end of the meeting on October, 7th, we will know

  • what type of pain this problem does to different organizations
  • possible solutions to this problem, and hopefully can chose one, we can bring up to the OHDSI steering committee.

Tagging @katy-sadowski @Andy_Kanter who already showed their interest,

@Gowtham_Rao @Chris_Knoll @Christian_Reich @MPhilofsky @abedtash_hamed who actively participated in the related discussion a while ago.

@aostropolets @Vlad_Korsik @zhuk @m-khitrun @Polina_Talapova @Eduard_Korchmar as the vocabulary team

I’ll start with answering my questions:
Use cases

  1. Couple of examples:
    Spotting in a first trimester pregnancy.
    when source is mapped to 2 concepts, one concept is chosen as index event, another - as inclusion criterion happening at the same day. - not that bad when we look at one concept.
  2. But if want to build the complicated phenotype definition, such as ‘Any malignancy excluding non-melanoma skin cancer’, it becomes more complicated.
  • and we can’t do that simply by excluding all descendants of ‘disease in remission’, because not all cancer in remission is mapped to one code which has ‘disease in remission’ as a parent, so I listed the source codes instead.
  • we can’t easily exclude non-melanoma of skin, because it’s mapped uphill to malignant neoplasm
  1. Or we can’t exclude the Migraine with cerebral Infarction from the cerebral infarction phenotype because it’s mapped both to the Cerebral infarction and Migraine with aura. (Migraine with infarction is a confusing condition, and clinicians suggested to remove it).
    In theory we can say ‘no migraine on the index date’, but it becomes too complicated for the users as they can’t track the resulting set of source concepts included.

How to solve it
The idea I like the most: to make ICD10CM concepts standard if they don’t have SNOMED equivalent and if they represent distinct clinical case, which means, other and unspecified terms will not be standard.
Then these concepts will get Is_a relationship to the concepts they have Maps_to now.
All those source concepts that are mapped to several concepts and have distinct meaning will become standard as well as concepts that are mapped uphill. We can detect concepts having uphill mapping using LLM.

Then, the other ICD ontologies can be mapped to ICD10CM or SNOMED.
Note, I mostly work with the US data, that’s why I might be biased, and I’m open to the another candidates to become the standard terminologies.

@Dymshyts:

(you are hyperlinking to epi.jnj.com/atlas. We cannot see that).

Not understanding your use cases.

  1. Spotting in first trimester pregnancy: Cohorts need to be built with two separate criteria at any rate. Because the data could contain spotting and pregnancy separately. Not sure we need this combo concept at all.

  2. Malignancy except non-melanoma skin cancer: This is a combination concept that isn’t even properly defined, as “non-melanoma skin cancer” is not a thing. It is the same problem as “NOS”. Why can’t we build a cohort with “malignant neoplasm” and descendants plus excluding “basal cell carcinoma of skin” and descendants? Like in 1, we have to do that anyway.

  • Remission is an Episode. It should not be used as an attribute of a disease concept, because, as you said, there is no way we will ever have all cancers pre-coordinated with “in remission” or “in progression”. These are so-called “Disease Dynamic” Episodes.
  1. Migraine with infarction: Again, the separate concepts need to be in and excluded anyway, because the data might contain them separately.

Bottom line: You are providing several categories of problems with mapping of complex concepts:

  • AND-combos (spotting and pregnancy): just split them up and create separate inclusion criteria.
  • AND NOT-combos (non-melanoma skin cancer): do they actually exist as concepts? If they do, no mapping will fix that.
  • Combination of attributes that live in different domains: These have a problem if there is no way to link them (which in cancer we put in place). But if they don’t have a link mechanism, the only solution I see is OMOP (or SNOMED) Extension.

thanks @Christian_Reich
I fixed the link.
Cancer excluding non-melanoma skin cancer wasn’t a concept but a phenotype we had.
Probably I need to find better examples where it’s one clinical idea concept is mapped to several or to one concept with losing of significant information

A related note: (not fully reply to your thread)

SNOMED CT at some point concluded that in addition to terminology, there is a need for grammar for ‘expressions’

See Compositional Grammar - Specification and Guide - Compositional Grammar - SNOMED Confluence

examples
https://confluence.ihtsdotools.org/display/DOCSCG/6.5+Expression+With+Nested+Refinements

Then you need a reasoner to possibly conclude that your expression is the same as formal concept. (or descendant of it)

Thanks for starting this thread, @Dymshyts ! Count me in.

My (and my team and Boehringer) use case: I want to create a concept set using standard concepts for a given condition. However, there are source concepts mapped directly to the “root” standard concept for that condition which I do not want to include in my concept set. This often occurs when specific ICD10-CM codes are mapped to a less-specific standard concept. In this case we are forced to create a source concept set for that condition :pensive: I can compile a list of specific examples ahead of the Symposium, if it’d be useful.

Solution ideas: I like the idea of using LLMs to evaluate existing mappings and propose corrections and/or new extension concepts as applicable. The most recent GenAI Workgroup meeting discussed this use case:

I will be happy to join in person at the Symposium (and hopefully @Ajit_Londhe @mdlavallee92 and others from our team can too!).

1 Like