OHDSI Home | Forums | Wiki | Github

Mapping "reason for visit" to SNOMED

Our EMR has valuable reason for visit information that we want to map to SNOMED concepts.

These data capture patients’ presenting problems/symptoms at a visit, not the diagnoses they receive after investigating them.

There are 2 ways in which mapping to SNOMED is different for reasons than for diagnoses:

  1. Diagnoses get mapped to SNOMED by linking to a different Masterfile than reasons link to
  2. The Masterfile that diagnoses link to always contain a SNOMED concept ID. The Masterfile that reasons link to only contains a SNOMED concept in about 5% of cases.

This is how the mapping of reasons looks within our EMR. To protect the EMR vendor’s IP, the values and field names are altered slightly.

ReasID     ReasName       MapID     VendorConceptID   ConceptName
160247     GI PROBLEM     1182209   Vendor#0217       DX/PROBLEMS - GASTROINTESTINAL - GI ILLNESS
160270     HEMATURIA      1182214   Vendor#RSGU0008   SYMP - GENITOURINARY - URINARY SYMPTOMS - HEMATURIA
160920     TOXIDROME      1182211   Vendor#0299       PROB - TOXIDROME

The value of VendorConceptID in the Masterfile for diagnoses is always a SNOMED concept. In the Masterfile that reasons link to, the value of VendorConceptID is a SNOMED concept in only 88 of 1,700+ distinct values. We’ve tried linking reasons to the diagnosis Masterfile, but get mismatches like the ones below.

ReasID  ReasName	MapID   VendorConceptID 	ConceptName
603	LEG NUMBNESS	120937	SNOMED#111798006	GASTROINTESTINAL ANTHRAX
604	OMT EVAL	120938	SNOMED#491000119101	BACTEREMIA CAUSED BY GRAM-POSITIVE BACTERIA
604	OMT EVAL	120938	SNOMED#128944008	BACTERIAL INFECTION DUE TO BACILLUS
32	DEPRESSION	120942	SNOMED#398523009	FOODBORNE BOTULISM

Here are some examples of reasons for visit that do not map to a SNOMED concept.

ReasID  ReasName
29	COUGH
633	EAR INFECTIONS
632	CHOLELITHIASIS
622	SHOULDER WEAKNESS
656	LEFT LEG PAIN
586	INCISION CARE
589	BACK PAIN WITH REFERRED PAIN
588	NECK PAIN WITH RADICULAR SYMPTOMS

Any help with how to do this - short of powering through 1,700 string searches and adjudication among potential SNOMED matches - would be greatly appreciated.

@Andrew:

If you have no reliable code to hang a link on, I am afraid all you got is the description, or ReasName. In your example, the VendorConceptId is what it says in ConceptName. Maybe the NLP people have something to do this more efficiently, but I doubt it.

I agree with Christian. Based on your examples, it looks like this field is not using any controlled vocabulary – that new strings are added w/o any tie to a coding system. If so, then you have it exactly right: “powering through 1,700 string searches and adjudication among potential SNOMED matches”. Of course you should do a distribution of counts over those 1,700 strings and perhaps pick those that cover 90% of the uses and perhaps just not bother with the very long tail that the last 10% will represent. I bet a small fraction of the 1,700 strings are in the top 80%-90%

Chris Knoll posted his advice how to do this, which was contrary to what I was telling my team to do (use source_to_concept_map table). But after reading Chris’ approach, his approach is much cleaner so I had to eat crow with the Colorado Health Data Compass team… Chris’ link below.

Thanks very much @mgkahn and @Christian_Reich
We will look at the distribution and guestimate the work for mapping the ones covering 85%.
I’ll also wait a bit in hopes that someone else is making progress with mapping their own reasons data and wants to join forces. My guess is that there will be significant overlap across institutions. If we can’t get a leg up or find a partner, this will go on the back burner till we’re done with the easier bits of the ETL and we can free up resources for wish list items we couldn’t get to.

@Andrew:

Two things:

We can probably ask around for partners. But if you could make your mapping public this would help the next guy, even if you end up not getting any help.

The 85% solution: This is what people often do. I am not so sure it’s a good approach. Because the common problems are not necessarily the important ones. Often, it’s the other way around. Nobody wants to study “Headache” or “Fatigue”, which pretty much every other patient experiences. Not sure how to do this, though.

@Christian_Reich We would certainly make the mappings public. We’re eager to find ways to pay back all the excellent support we’ve received. If you could ask around for partners that would be terrific. Thanks.

Interesting point about common cases being less likely to be of interest. The reverse probably isn’t true though. I.e. most rare cases probably aren’t more likely to be of interest. I’m not sure how to solve that one either.
We wouldn’t be stuck with the initial 85% though and could extend the mapping to new cases as needs arise.So I can see the appeal of that approach.

@Andrew,
I’m not sure if you are aware, but there’s a OHDSI tool to assist source code mapping:

Essentially, you create a CSV of your 1700 strings, run it through USAGI, and it will try to match it over to the appropriate concept in the vocabulary. You can choose which domains to look for (like if you know those strings are conditions, then you can ask usagi to only look for things int he condition domain). It might be worth playing around with that to see if it solves your problem. Trying to deal with 1700 strings is a big job (but actually quite manageable in Usagi).

-Chris

One other note, from your examples above, some of those might just be a simple observation, not an actual diagnosis. I believe might find "shoulder weakness’ as an observation, but after looking in atlas for shoulder weakness, I found:
http://www.ohdsi.org/web/atlas/#/concept/4093200
Which is “Shoulder girdle weakness”. Maybe that’s the concept that you’d want to map to in your source to concept map table.

If so, that’s 1 down and 1699 to go!

-Chris

Did another search, I’m really trying to find a case of an observation that you could map to, but I found LEFT LEG PAIN would map to:
http://www.ohdsi.org/web/atlas/#/concept/4117695
Which is the SNOMED concept for “Pain in left leg”.

1698 to go!

@Chris_Knoll
Thanks very much Chris. I’m aware of Usagi but haven’t tried it yet. It will be the next step in figuring out the size of the job. The distribution is pretty flat, not zipf as I’d hoped, so covering 85% will take mapping several hundred.
Not all are diagnoses. The other large class is visit type, e.g. “medication refill”.
We’ve been looking for a general mapping of our EMR vendor’s concept IDs to snomed of other standard vocabularies. There may be solutions we can’t afford like Intelligent Medical Objects, not sure.

@Andrew - if it is helpful there is detailed information on USAGI here:
http://www.ohdsi.org/web/wiki/doku.php?id=documentation:software:usagi

Python has a library called fuzzywuzzy - an API for comparing string similarity. This can make the task of manual labeling much quicker, particularly if you then filter out the known correct answers as you go you and apply automatically labellum of very close matches to strings that are already manually labeled.

@ericaVoss and @stuartreynolds
Thanks to you both. We’re prepping the data for Usagi. There are more kinds of data than we’d thought originally, so there’s a categorziation task prior to mapping. If we get discouraged with the lift after trying Usagi on one of the larger categories, we’ll think about fuzzywuzzy, … because once you’ve been exposed to the name it’s hard not to think about fuzzywuzzy.

I’d beware about simple string proximity: usagi will find things like ‘cancer’ maps to ‘primary malignant neoplasm’ for example. Sometimes things you map to don’t actually sound the same as the standard concept in the CDM.

t