OHDSI Home | Forums | Wiki | Github

USAGI experience

(Vojtech Huser) #1

The Banda paper on FAERS used USAGI for mapping to concepts and I was inspired to try it on a different dataset.

1. My questions is how long it typically takes for USAGI to produce mappings for 10k rows of data? (my code file had 157k entries) and I had to kill the tool after few hours of waiting.

2. Does it make a difference it Usagi is given 1GB of memory vs. 10GB?

Here is what I did (on windows).

  1. download usagi.jar
  2. get vocabulary files (if you don’t have them) (Athena plain download is fine)
  3. make a folder for Usagi (it will generate index subfolders there)
  4. only now run usagi for the first time
  5. give it more memory via .bat file like this: java -Xmx5000m -jar Usagi_v0.3.3.jar

(Chris Knoll) #2

I’m doing an usagi batch now and it takes about 10 minutes but for only 1k row of data. I selected the top 1k source codes that appear most frequently. 157k rows? It’s going to take a long time

Do you realize that after usagi attempts to map those 157k rows, you’ll need to manually review the 157k rows to ‘agree’ with the mappings that Usagi came up with? So, I’d start with the top 1k codes, work throught hat list and then move on to the next batch until you either run out of money or account for the vast majority of codes you find in your data source.

(Martijn Schuemie) #3

157k rows is quite a lot, but eventually it will finish (you can let your computer run overnight?). Usagi is performing quite a feat, doing fuzzy string matching between your entry terms and millions of terms in the vocab. Giving it more memory will help a bit, but I’m not expecting much improvement above 5GB.

It helps a lot if you can narrow it down for Usagi. Make sure you filter by domain if possible and perhaps by concept_class_id.

(Vojtech Huser) #4

Thank you for reply and some hints.
I did reduce it to 33K in my next try and left the computer overnight. It did finish.

For the sake of others, the batch approve of score=1.0 is what I hope to use in my mapping phase 1.

(ignore the duplicates in the screenshot, that is fixed now).
Interestingly my input data does not have source_code. I had to generate those. I am not sure if Usagi requires them.

(Hajarhomayouni) #5

Hi all,

I have a question using USAGI. has it been integrated in Atlas or is there any future plan for this integration.

Thank you.

(Martijn Schuemie) #6

Hi @hajarhomayouni,

No, Usagi hasn’t been integrated into ATLAS.

No, there are no plans to do so.


(Vojtech Huser) #7

It is great to see a new feature in USAGI. Thank you for this addition. @schuemie. We are using it on a large set of free-text adverse event terms.

(Mary Regina Boland) #8

@schuemie: Is there a backend script we can run to map free text medication names using USAGI? We have close to a million medication names that we would like to map (we will perform a manual review of a subset afterwards on the file). Is it possible to run a backend script without using the USAGI interface (it is currently too slow and only works for 10k at a time). Any help would be awesome!

(Anna Ostropolets) #9

Do those medications have separate columns for ingredients/forms/dosages on any kind of standardized name? If yes, I’d suggest use those attributes and run the scripts that are used for OHDSI drug vocabs. Would be more reliable with dosages and ingredients that have similar names, for which we did a lot of manual reviews when we tested USAGI on drugs.

(Mary Regina Boland) #10

The main issue is running a large set of medications at once - the java program only appears to run for about 10k concepts at a time and it takes about 2 hours to run. If there’s a script that could be run in say R, we may be able to run it on a large server. But the UI for the USAGI doesn’t appear to be amenable to large-scale concept mapping

(Martijn Schuemie) #11

Yes, Usagi certainly has its limits, and I can’t think of an easy way to scale it. It does use multi-threading, so having more cores available should speed things up.

To Anna’s point: when mapping drugs, the preferred approach is to not import all individual drug codes (e.g. ‘acetaminophen 100mg oral tablet (Aleve)’). Instead, break it up into its separate components. For example, create a list of unique generic names (e.g. ‘acetaminophen’), and map those. I would be surprised if you have > 5,000 of those in your data. Once you’ve mapped all components separately, you can use some logic to put them together to create a much higher quality mapping that what could be achieved if you map each drug code individually.

(Mary Regina Boland) #12

So basically what you are saying is that before one should use USAGi, the following steps are required a priori:

  1. parsing of drug dosage levels and removal of root drug stem name
  2. mapping of drug stem names to their corresponding generic drug names for all possible ingredients in the drug
  3. use the generic drug names for input into USAGI

Therefore, USAGI doesn’t map raw drug names to the OHDSI terminologies - other systems such as UMLS, etc. would be required first. Is that a correct assessment?

(Martijn Schuemie) #13

I’m not sure what you mean by ‘drug stem name’, why you would need to do step 2, and where UMLS would come into play, so I’m not sure it is a correct assessment :wink:

Do you maybe have 1 or 2 example entries that you’re trying to map? That might make it easier to explain.