OHDSI Home | Forums | Wiki | Github

Usagi

@schuemie

I loved the demo of Usagi at the NY F2F. Any update on a potential release date for the tool? My team is starting several ETL processes and I think this tool would be a valuable addition.

Thanks,
Bill

I believe @ericaVoss would also be able to point you to all things Usagi.

The Usagi release is currently waiting for two things:

  1. Documentation, which @ericaVoss has gracefully offered to help write
  2. The release of the Vocab V5.

Let me check with @Christian_Reich and his team on a release date for Vocab V5 .

Can someone, please, describe briefly what is Usagi and on which day (and which session; for finding it within the recording ) it was demoed?

(for people who were not at the F2F meeting)

I am planning on finishing up the documentation next week and will post to WIKI when I do. But I will paste the introduction here:

1. Introduction
Usagi is a software tool created by the Observational Health Data Sciences and Informatics (OHDSI) team and is used to help in the process of mapping codes from a source system into terminologies, preferably standard ones, stored in the Observational Medical Outcomes Partnership (OMOP) Vocabulary ([http://www.ohdsi.org/data-standardization/vocabulary-resources/]). The word Usagi is Japanese for rabbit and was name after the first mapping exercise it was used for; mapping source codes used in a Japanese dataset into OMOP Vocabulary concepts.

Mapping source codes into the OMOP Vocabulary is valuable for two main reasons:

  1. When converting a raw dataset into the OMOP Common Data Model (CDM) ([http://www.ohdsi.org/data-standardization/the-common-data-model/]), translating source specific codes into standard concepts (i.e. RxNorm or SNOMED) translates the source data into a ā€œcommon languageā€ other CDMs follow.
  1. Having source codes tied into the OMOP Vocabulary concepts allow a researcher to leverage the power of finding relevant source codes leveraging classification terminologies in the OMOP Vocabulary (e.g. find all antipsychotic medications or find all condition codes related to heart failure).

1.1. Scope and Purpose

A source code file from a dataset that needs to be mapped are loaded into the Usagi (if the codes are not in English additional translations rows are needed). An adapted version of Apache Lucene ([http://lucene.apache.org/] is used to connect source codes to OMOP Vocabulary concepts. However these code connections need to be manually reviewed and Usagi provides an interface to facilitate that.

Usagi currently does not currently translate non-English codes to English. We suggest using Google Translate ([https://translate.google.com/]).

@ericaVoss

My team is moving forward on CDM v5 implementations at 2-3 locations. I happily volunteer to beta test the documentation against some real word ETL if youā€™re interested.

Bill

@wstephens Sounds like a plan! I should have the documentation done next week some time.

But I do think @schuemie needs to get it updated with the latest Vocab5 from Christian / Lee. There is a version of Vocab5 released but maybe Martijn is waiting for the newest update?

Bill, I just created a [new release of Usagi] (https://github.com/OHDSI/Usagi/releases/tag/v0.2.0).

And Erica has posted the manual in our Wiki.

I would have made this version 1.0.0, but I only want to do that when the Vocab V5 is officially released. Youā€™ll need the Vocab V5 CSV files to start using Usagi. Ping @Christian_Reich if you donā€™t already have them.

Let me know if you want to give it try, and if you have any issues.

Cheers,
Martijn

Excellent! Pulling it now.

1 Like

OK, Iā€™m running through a mapping exercise using Usagi. Some initial thoughtsā€¦

Convenience:
It would be great to be able to select a group of unapproved matches in the Overview Table and approve all with a single ā€œapprove allā€ click. I had a bunch of Match Score = 1.0, but had to iterate through all.

Issue:
When attempting to conceptually map ā€œINSPIRATORY TIMEā€ from Cerner to CMD v5 using SNOMED, I expected to find SNOMED code 250819002 as a mappable option. This entry is in the concept.csv file that I loaded into the Lucene index (4353947,Inspiratory time,Observation,SNOMED,Observable Entity, S, 250819002,19700101,20991231,). However, I cannot seem to find a way to get this value as a mappable option through any combination of Search or Filters.

The first is easy: once youā€™ve select all matches your want to approve, you can go to Edit --> Approve selected and all selected items will be approved. I guess we need to add that to the Wiki

The issue is harder: Iā€™m unable to reproduce this. If I type in the manual query ā€˜inspiratory timeā€™ that SNOMED concept is the first that pops up. Have you unchecked all filters? Can you tell me which version of the CSV files you used? Iā€™m on Vocabulary5.0-20141013

Iā€™m using the same Vocab version. Iā€™m going to try to reload the index.

  • Query: when selecting the ā€œQueryā€ radio button / text box it would be nice for the selected source term to auto populate in the query text box rather than the previous query text remaining.
  • When selecting a match from the results pane and clicking the ā€œAdd Conceptā€ button for an entry that mapped to concept ā€œ0ā€, it would be great for the ā€œ0ā€ concept to be automatically removed from the Target Concepts list.
  • Multiple mapped concepts: itā€™s possible to map multiple concepts to a single source term in the Target Concepts window. Shouldnā€™t this be limited to a single concept mapping?

Bill

  • I agree keeping the text from the previous search is probably not what you want, but Iā€™m not sure if we should start with the source term. Let me think about it.

  • I recommend you use the ā€˜Replace conceptā€™ button instead, which does do exactly what you want.

  • We did this deliberately, since sometimes thereā€™s no avoiding mapping to multiple concepts. We see for example source codes like ā€˜Disease A and Bā€™, whereas the target vocabulary only has concepts for A and B separately. That being said, this feature should be used only as an absolute last resort. I think I will add a warning popup if somebody approves a code that maps to more than on concept, and will make sure to mention this in the Wiki as well.

Another issue:

it appears that there may be a character set issue. The Results pane is having issues displaying some characters, but only in the ā€œConcept nameā€ column:

  • meniereā€™s disease: Vestibular active Mļæ½niļæ½reā€™s disease
  • SJOGRENS SYNDROME: Sjļæ½grenā€™s syndrome

Bill

The Vocab files are using ISO-8859-1, but I was expecting UTF-8. In the next release of Usagi Iā€™ll use ISO-8859-1.

Thanks!

Does OHDSI have a character set standard? I also assumed UTF-8 as the default because we always use that.

Bill

Opened a discussion on that topic here.

Martijn,

Iā€™m working through a mapping from ICD9 to Snomed conditions. Iā€™m seeing quite a bit of issues with incorrect matching between the terms with and without. Iā€™ve hit this about 50 times on this run through 1600 ICD9 terms.

input: ā€œOpen wound of face, unspecified site, without mention of complicationā€
match (0.82 score): ā€œOpen wound of face with complicationā€
should be (0.75 score): ā€œOpen wound of face without complicationā€

thoughts?

Bill

Hi Bill,

I suspect you are already aware of this, but in case not: You can get complete ICD-9 to SNOMED CT mappings from:

Brandon

Brandon,

I have the NLM mapping pulled up while running through this process with Usagi.

Iā€™m not familiar with the IHTSDO mapping and havenā€™t found it yet (that site loves their PDFsā€¦)

Bill

t