OHDSI Home | Forums | Wiki | Github

Usagi

Friends:

Can I ask for a favor: Let’s not have different mappings in parallel. That will derail us, because everybody will have a different reality when querying data. Instead, let’s find these systematic problems and fix them.

We have looked at the with and without extensively. We don’t want to do a plain mapping to these in SNOMED, because they are often pretty “dyssocial”, which means they don’t have or don’t have the right hierarchical relationships. Instead, we map to the actual conditions. If something isn’t mentioned - then we won’t mention it either.

I also would advise against the equivalence map. It is only a partial map, we used it as input to our map, and it only does equivalents. Many ICD9 codes are complex, and the equivalent concepts actually have problems: they sound the same, but they have very different children.

All together: I think USAGI is a great tool for efficient mapping of local codes. For mapping of hierarchies to each other it is too simple.

Bill: Can you give me examples of with/without mappings you don’t like, and we discypher?

C

You can get find the raw IHTSDO file in the international release; in the USA you can get this from: http://download.nlm.nih.gov/umls/kss/IHTSDO20140731/SnomedCT_Release_INT_20140731.zip (You’ll need a free UMLS license if you don’t have one.)

Alternatively you can browse the same map is in Snow Owl (under Mappings / ICD-9-CM equivalence complex map reference set) which you can get here: http://b2i.sg/download/

I’m not sure what the difference is between the international and US mappings; we have only looked at the former which is an equivalence map.

Brandon

Hi Bill,

that specific example has the following explanation:

The first match has a synonym “Open wound of face, unspecified site, complicated”, where the “unspecified site” matches part of your search string. Usagi (or actually the Lucene search engine) does not understand semantics, it just attributes a higher weight to the matching of “unspecified site” than it does to matching “without” (which is a very frequent word and therefore gets a low weight).

I’m not sure what to do about this. This is just the way the algorithm works.We could hack in rules, but that would make the behavior very unpredictable.

May be off topic, but Lucene indices without negation annotations are
dicey. Negex can be used to create these annotations and lower the weight
on that basis. But like everything, there’s more work involved. My 2
cents: For term search, as long as a human is picking something from a
list, it’s okay to have the occasional wrong choice appear up top so that
you don’t accidentally exclude stuff.

Several years of participating in TREC (Text REtrieval Conference) has given me immense respect for pure TF * IDF with cosine matching (basically what Usagi does): any modifications may solve some specific issues, but will bite you in other situations and will often reduce overall performance.

I want to use Usagi to build mapping from ICD10CM to SNOMED. Problem is after downloading and unzipping SNOMED vocabulary v5.0 from ATHENA and then building index using this vocabulary, it writes “Building index. This will take a while… Sorting vocabulary files.” And then Usagi don’t responce - I was waiting about one day.
Please help with this.

Are Usagi and the ATHENA files all on a local drive? (Running from a network drive would take forever)

The Vocabulary already has an excellent ICD10 to SNOMED map, so why not use that one?

@schuemie:

@Dymshyts is the source of the “excellent ICD10 to SNOMED” mapping. :slight_smile: He is now building the next one. However, ICD10 and ICD10CM has subtle differences, even when the code is the same. I know. It sucks. Welcome to our world.

Thanks for your answer. The problem was that I used an old version. I have downloaded the last one, and it works properly.

I like how it works making mapping to SNOMED. Now I want to make mapping to ICD10. It doesn’t build anything - only ‘0’ as a target concepts value. @Christian_Reich suggested that problem could be that ICD10 concepts are not defined as Standart, so I set in Standard_concept = S in ICD10 concept file. But it doesn’t work anyway. Is the problem about an empty ICD10’s concept_synonym table, or other reason?

The reason is I hard-coded the list of allowed vocabularies. Please try this new version where the list of vocabularies is derived from the vocab files instead. You’ll need to rebuild the index.

@schuemie:

Instead of limiting by vocabulary, you may want to limit it by Standard Concept (concept_level>0 or standard_concept = ‘S’). That way, you don’t have to worry about it.

Dmitry had this nice idea of “abusing” Usagi by declaring all ICD-10 concepts to Standard, in order to get his ICD10CM to SNOMED code jump started. For that, we have to figure out which ICD10 concepts are not identical to the ICD10CM concepts of the same concept_code. Unfortunately, that happens.

Yes, that is what Usagi now does. It ignores everything except standard concepts, and builds a list of all vocabulary IDs that the standard concepts have. That is the list you can choose from. If Dmitry declares ICD-10 codes to be standard, it should work. Let me know if it doesn’t.

I didn’t find how to change vocabulary definition (standard or not) in usagi. So I’ve changed munualy ICD10 concept.csv file: set vocabulary_id = ‘SNOMED’ and STANDARD_CONCEPT = ‘S’. Now it works. But then of course I should change again ‘SNOMED’ to ‘ICD10’ to avoid confusing.

@Dymshyts:

There is no such a thing as a Standard Vocabulary. Only the concepts can be Standard or Non-Standard. However, some vocabulary have a lot of Standard Concepts, for some of them all Concepts are Standard, and some vocabularies have no Standard Concepts at all.

But it start working only after changing vocabulary_id from ‘ICD10’ to ‘SNOMED’.

t