USAGI experience

Mary_Regina_Boland · May 8, 2019, 5:51pm

A lot of this depends on how much information you have a priori…my point is that we don’t have detailed information on drugs unless we manually review each drug name and that is not feasible for a dataset containing 700k unique drugs

jliddil1 · May 8, 2019, 7:06pm

This appears to be sold in the US. Has FDA Approval. It is in RXNorm and FDA Approved:
https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=overview.process&ApplNo=021840

This is a complex example as it is birth control and composed of two different tablets with distinct formulations

Christian_Reich · May 8, 2019, 7:18pm

Nope. It’s not. What’s sold in the US is a pack of 84 Estradiol/Norgestrel tablets and 7 Estradiol tablets. The “ugly” shortcut only mentions the former,. Outside the US, they seem to sell the progesterone part of the contraceptive only, and that’s the one @schuemie is so happy Usagi picked up.

I strongly suggest involving the Vocab team (@aostropolets and @dimshitc) into this. They know the pitfalls inside out.

dramacloak · May 8, 2019, 7:26pm

Is there a place for natural language processing to assist with mapping the 700k entries?

jliddil1 · May 8, 2019, 7:27pm

I thought the original poster referred to Seasonique
From what I see in Europe it is the same as the US

Regardless this point to the complexity of mapping a brand name product that maybe composed of more than one drug/chemical

aostropolets · May 8, 2019, 8:19pm

Usagi uses NLP. Metamap also uses NLP and sometimes perform better than Usagi.
I advocate for and use only one approach, which is a systematic one. If you don’t care about precision, map to ingredient (following the links from a Brand Name to an Ingredient or simply mapping to Ingredient when you have a generic drug in your source data). If you do - break them down into the attributes, map the attributes and then we will show you the tricks of the trade. Anyway, happy to help.

jposada · May 8, 2019, 11:07pm

What is your opinion regarding this service?

https://mor.nlm.nih.gov/RxMix/

And then using Athena to map to standard concepts

Mary_Regina_Boland · May 9, 2019, 1:33am

Thanks @jposada I also believe RxMix is the way to go
It appears scalable and not too slow. Also RxNORM codes are a lot easier to map to OHDSI CDM.

Regarding USAGI’s use of NLP I can’t comment directly as I did not create the tool - but it appears to be performing more of a string search matching (‘bag of words’ model) where the hope is that a word, such as ‘seasonalique’ is contained within the full name of some drug name in a terminology that is part of the CDM it then matches that drug name to the term with some scoring criteria. Either way, I don’t have any direct complaints against USAGI except for the time it takes to match one drug name to the CDM - its just not scalable for a large corpus. It also appears to require a lot of a priori parsing of drugs from their dosages, etc.

Using something like RxMIX and mapping drug names to RxNORM and then using RxNORM to map to the OHDSI CDM appears to be an efficient approach if one has a large data dictionary of medications as we do. I had hoped to stay within the OHDSI framework for utilizing tools, but that won’t work for us given the time constraint on running so many medications through USAGI

I’m curious about others approaches to this problem, does everyone use RxMix? Does everyone already have RxNORM codes from the get-go?

If you have used USAGI - how large was your terminology that you used (5k, 7k, 100k?)? I would love to hear others experiences with mapping local drug names to the OHDSI CDM using either OHDSI tools like USAGI or other tools

jliddil1 · May 9, 2019, 3:24am

Maybe a silly idea. Can one run parallel usagi jobs against RXnorm?

MaximMoinat · May 9, 2019, 8:56am

Hi Mary, we have done something similar for a corpus of (just) 4750 Danish drug codes to the OMOP vocabulary. Our methods and results have been published at the 2016 OHDSI Symposium. As mentioned before, this approach makes use of the fact that the drug codes had, besides the name, attributes like ‘ATC code’, ‘dose form’, ‘numeric strength’ and ‘unit’. This was all written in plain SQL. Usagi was used to map the Danish dose form and the units to standard concepts.

Hope this helps.

Christian_Reich · May 9, 2019, 10:50am

If by “drug name” you mean the terms in your list you will get a horrid mix of brand names, ingredients, forms, strengths, and other jargon, all in abbreviations to save screen space. We know this. There are also active compounds and inactive compounds (excipients, which RxNorm ignores). This is compounded by the problem that many drugs (50% roughly) are fixed combination drugs, and the combos are not fully specified. Finally, you get the packs, like in your example, which are combinations of drugs.

Correct, the EMA approved the same thing as the FDA. Debugging this it looks like the “incomplete” version came from the German drug repository GGR: 36286712 “Seasonique filmomh. tabl. 1x91 (84x+7x), pack content #1”, where they had separate codes for the estrogen/progesterone combo (mpp3272200-1) and the single estrogen piece 36288444 “Seasonique filmomh. tabl. 1x91 (84x+7x), pack content #2” (mpp3272200-2). Both were updated last year, changing their Concept Class from “Drug Product” to “Med Product Pack”, meaning, they no longer exist as individual components in the data. We still need to teach RxNorm Extension how to process packs in GGR better.

As @aostropolets says: Cut them into pieces as best as you can and then have the drug matching process deal with the components. RxNav or RxMIX can help you with that. Alternatively, give you list to the vocab team and say please please.

We realize this is not easy. We want to roll out a “Vocabulary Clinic” especially for these drug lists. In it, it will contain a fully self-service mode, where it guides you through a step-by-step process to get this done right.

jliddil1 · May 9, 2019, 4:42pm

As with the NGS moving target, new drugs are approved all the time as well as new indications and new dosages for a given disease. Then we have the issue of drugs being given different brand names around the world. Then you also have drugs that approval is removed by a given regulatory authority. (https://www.pmlive.com/pharma_news/lilly_to_withdraw_lartruvo,_a_first_for_accelerated_approval_failures_1285829)

So perhaps we can map the majority of drugs/chemicals then there will be the regular maintenance task for new drugs approvals, data entry errors etc.

aostropolets · May 10, 2019, 12:53am

Around the world? Do you know that we have a vocabulary for such things?

jliddil1 · May 10, 2019, 3:20pm

You don’t say.

aostropolets · May 14, 2019, 6:52am

I just feel that I’ve been advertising RxNorm Extension for ages Long story short: this is a standard drug vocabulary for international drugs that follow the structure and the logic of RxNorm. Christian provided the link to the documentation that exists right now but it will definitely be improved.
The approach is pretty close to what @MaximMoinat presented at the Symposium with the one exception: it is highly standardized. Once you get your attributes ready, the script will deal with close dosages, different forms, ambiguous ingredients and so on and so forth.
Again, you decide what level of precision and accuracy you need.

MPhilofsky · May 15, 2019, 5:16pm

@Mary_Regina_Boland,

At Colorado we have used Usagi for the larger volumes of string terms >300. I prefer the Athena interface for <300 string terms (vaccines, provider specialty, vital signs, etc.). I also use RxNav to verify a drug’s validity period. We have mapped ~ 12000 strings from all domains. The drug domain has the largest number of custom mappings. Mappings have been done domain by domain over the past couple of years.

dkendrick · February 24, 2020, 7:50pm

I am using Usagi to develop a code normalization process. The tool is fantastic and the matching scores are very helpful.
However, I have many thousands of items to code, and it would be very helpful to be able to multi-select the rows for mapping so that all similar rows get assigned the same selected code. Is this possible to add?

MaximMoinat · February 26, 2020, 3:20pm

Hi David. Thanks for your feedback. Usagi only supports approving multiple codes at once. I don’t think there ever was a use case to map multiple terms at once to the same target concept. In most cases the source terms are distinct enough to require one-by-one evaluation and assignment.

Please add a feature request to the Usagi github issues. If possible, also so including your particular use case or an example. https://github.com/OHDSI/Usagi/issues. We are currently the maintainers of the code base and in the process building a road map for Usagi.

zhaoyunfei · March 12, 2021, 6:47am

您好，教授！我想利用USAGI使FAERS的药名名称规范化。我刚打开USAGI创造索引的时候，只能选中1个csv文件，但我下载的雅典娜包含有6个域。我在官网上查找了一些资料，但似乎都偏离了我的简单问题。谢谢您!

MaximMoinat · March 16, 2021, 5:04pm

Hi @zhaoyunfei. With a simple Google translate, I think I understood you question about building the vocabulary index with Usagi.

The input Usagi needs is the folder with the six csv files (unzipped) that you get from Athena. So you select one path: the folder.