OHDSI Home | Forums | Wiki | Github

Metric for evaluating CDM mapping/Vocabulary

I am working with other members of the Dentistry WG to assess mapping dental concepts to the OMOP-CDM. We are evaluating the current state of dental vocabulary and the ability of the OMOP-CDM to accommodate dental use cases. We are mapping a dental use case to the OMOP-CDM. Our goal is to assess if a dental concept maps based on if there is vocabulary that adequately captures the meaning of the concept and if the OMOP-CDM has the ability to correctly capture the concept.

Is there a metric that can be used to measure how well a concept maps to the OMOP-CDM? When I did this before, I used a simple system of YES, NO, or MAYBE. If a concept was a YES, it meant that the concept both had a definitive and discrete vocabulary term and could clearly be mapped to the OMOP-CDM. If it was a NO it meant that there either was not a term to describe the concept or the concept could absolutely not be mapped to the CDM. If a concept was a MAYBE, it meant that either there was a vocabulary term but it was ambiguous or multiple terms could possibly be used or it was unclear whether a concept could be mapped to the CDM.

For example: A dental patient has dental caries on #12 (an upper left first bicuspid tooth) on the distal-occlusal surface. The CDM does not have a way to capture individual teeth or tooth surface information. This would be an inability of the CDM to accommodate this concept. In my old system, this would be a NO.

Periodic oral evaluations (an annual dental exam) is usually captured as a CDT billing code (D0120) in the electronic record. The SNOMED term (51733004) is more descriptive, but is rarely captured. Periodic oral evaluations include an extraoral evaluation (head and neck exam), an evaluation of the temporomandibular joint, a hard tissue evaluation (teeth), and a soft tissue evaluation (periodontium). This concept has vocabulary terms that could describe it, but the dental community needs more precise definitions for which should be used in different phenotypes and captured in the electronic record. In my old system, this would be a MAYBE.

Is there a more formalized or better metric for measuring mapping/vocab for a given concept?

The Dentistry WG is currently working on submissions to the OHDSI Global Symposium and this is one of our abstracts. If discussing in the forums is not enough fun and excitement for you, please join us on Thursdays at 7PM ET

That’s an interesting question and very timely. I think it has two parts: mapping precision (how well the target concept in the Vocabs represents the source concept) and mapping confidence (how sure you are about your mappings).

Mapping precision
On one hand, a common approach in the community has been a binary variable: something is either not mapped or mapped (and the latter included 1:1 matching or 1:many matching with various precisions).

On the other hand, there is a growing interest in storing more details about precision of the mapping (eg., if it’s a strict match or uphill match (more granular term is mapped to a broader term). I know @Rijnbeek team somehow stores that info in their ETLs and @Polina_Talapova has some knowledge through her work with Tufts and SSSOM. The details are categorical variables.

In general, we as a community need to develop better classification for types of mappings (be it Maps to in the vocabs or in your local ETL). It may be YES (can be mapped)/NO (cannot be mapped) and YES will have some subcategories (strict, uphill, something else) or a score. We will need to decide.

Mapping confidence

Here, you can think of a continuous variable. For example, if you use cosine similarity as a metric that would be the number you can store as a proxy. Or you can give extra points if the mapping was reviewed by clinicians/OMOP experts. Example would be confidence field in the SSOM standard.

Final thought
This post goes nicely with the discussions about the meta-data we want to store, which have been conducted here and there. With us rolling out community contribution guidelines we want to have a convo about this exact topic and how we want to capture precision/confidence/other attributes. If you aren’t in the Teams Vocabulary sub-group, please sign up - we will post the announcement and the invite for the Vocab WG call dedicated to these topics there.

1 Like