OHDSI Home | Forums | Wiki | Github

Mapping Concepts in Usagi - SNOMED Vocabulary

(Kathleen Lee) #1

Hello! I’m working on a data mapping project and I’m wondering how exactly Usagi maps SNOMED concepts to the study concepts because I’m getting suggestions that have a fairly high match score but are in no way related.

e.g. “NSAIDs” was mapped to “AIDS” with a .73 match score. I understand that it’s because of text similarity, but despite having a “SOURCE_DESCRIPTION” field in my import file it doesn’t seem to aid in finding a better, or more appropriate match.

Is there a way to increase the accuracy of the mapping suggestions on Usagi? Every time I import codes I have SOURCE_CODE and SOURCE_NAME fields, with the latter being pretty informative about the variable description. If there is a way I should re-structure my dataset or anything I’m open to suggestions.


(Christian Reich) #2


You are mapping NSAIDS to what?

SNOMED is generally used to encode Conditions (diagnoses, symptoms etc.) and Procedures. NDSAID is a drug class, so shouldn’t be mapped to SNOMED.

Let us know.

(Vikram) #3

Hi Kathleen Lee,

I am facing similar issue as you , where in the match score is not representative of the match of the SOURCE_NAME .Some of the matches are irrelevant and yet have a high score.

As Christian pointed out , do we need to map our source concepts to specific vocabulary?

I could see that Usagi is using the cosine similarity for the text matching.

I would like to understand why is cosine similarity being used instead of the other string matching algorithms .What is the advantage or metric here that is being compared for choosing this text similarity.


(Roger Carlson) #4

When you Import you Codes (File>ImportCodes), you can use Filters to narrow your matching to specific Concepts, Vocabularies, and/or Domains (or any combination thereof). So, NSAIDs are medications, which should probably be filtered by the Drug Domain and RxNorm vocablulary. Something like this:

Note that simply selecting a value in the dropdown does not check the Filter box, so you have to do both.

That said, mapping all your drugs via USAGI is a major undertaking, one you don’t need to do if your source has some other non-standard coding like NDC. You can use the built-in OMOP hierarchies to do the mappings.

(Chris Knoll) #5

I’m sorry, I don’t have the details of why one algorithm was chosen over another. @schuemie should be able to provide those details, and possibly comment on the ability to swap in different matching algorithms based on user preferences.

(Martijn Schuemie) #6

For Usagi we needed a matching algorithm that is relatively fast, since for each search we compare against millions of terms, and we typically do many searches because we have many source codes. For that reasons I chose Lucene, also because in my experience in information retrieval it is hard to beat cosine similarity consistently (you may find some examples where something else works better, but that always tends to come with poorer performance for other search terms).

@vkramdev: Could you give some examples of source names that are returning matches that are irrelevant and yet have a high score?