I loved the demo of Usagi at the NY F2F. Any update on a potential release date for the tool? My team is starting several ETL processes and I think this tool would be a valuable addition.
Thanks,
Bill
OHDSI Home | Forums | Wiki | Github |
I loved the demo of Usagi at the NY F2F. Any update on a potential release date for the tool? My team is starting several ETL processes and I think this tool would be a valuable addition.
Thanks,
Bill
The Usagi release is currently waiting for two things:
Let me check with @Christian_Reich and his team on a release date for Vocab V5 .
Can someone, please, describe briefly what is Usagi and on which day (and which session; for finding it within the recording ) it was demoed?
(for people who were not at the F2F meeting)
I am planning on finishing up the documentation next week and will post to WIKI when I do. But I will paste the introduction here:
1. Introduction
Usagi is a software tool created by the Observational Health Data Sciences and Informatics (OHDSI) team and is used to help in the process of mapping codes from a source system into terminologies, preferably standard ones, stored in the Observational Medical Outcomes Partnership (OMOP) Vocabulary ([http://www.ohdsi.org/data-standardization/vocabulary-resources/]). The word Usagi is Japanese for rabbit and was name after the first mapping exercise it was used for; mapping source codes used in a Japanese dataset into OMOP Vocabulary concepts.
Mapping source codes into the OMOP Vocabulary is valuable for two main reasons:
- When converting a raw dataset into the OMOP Common Data Model (CDM) ([http://www.ohdsi.org/data-standardization/the-common-data-model/]), translating source specific codes into standard concepts (i.e. RxNorm or SNOMED) translates the source data into a ācommon languageā other CDMs follow.
- Having source codes tied into the OMOP Vocabulary concepts allow a researcher to leverage the power of finding relevant source codes leveraging classification terminologies in the OMOP Vocabulary (e.g. find all antipsychotic medications or find all condition codes related to heart failure).
1.1. Scope and Purpose
A source code file from a dataset that needs to be mapped are loaded into the Usagi (if the codes are not in English additional translations rows are needed). An adapted version of Apache Lucene ([http://lucene.apache.org/] is used to connect source codes to OMOP Vocabulary concepts. However these code connections need to be manually reviewed and Usagi provides an interface to facilitate that.
Usagi currently does not currently translate non-English codes to English. We suggest using Google Translate ([https://translate.google.com/]).
My team is moving forward on CDM v5 implementations at 2-3 locations. I happily volunteer to beta test the documentation against some real word ETL if youāre interested.
Bill
@wstephens Sounds like a plan! I should have the documentation done next week some time.
But I do think @schuemie needs to get it updated with the latest Vocab5 from Christian / Lee. There is a version of Vocab5 released but maybe Martijn is waiting for the newest update?
Bill, I just created a [new release of Usagi] (https://github.com/OHDSI/Usagi/releases/tag/v0.2.0).
And Erica has posted the manual in our Wiki.
I would have made this version 1.0.0, but I only want to do that when the Vocab V5 is officially released. Youāll need the Vocab V5 CSV files to start using Usagi. Ping @Christian_Reich if you donāt already have them.
Let me know if you want to give it try, and if you have any issues.
Cheers,
Martijn
Excellent! Pulling it now.
OK, Iām running through a mapping exercise using Usagi. Some initial thoughtsā¦
Convenience:
It would be great to be able to select a group of unapproved matches in the Overview Table and approve all with a single āapprove allā click. I had a bunch of Match Score = 1.0, but had to iterate through all.
Issue:
When attempting to conceptually map āINSPIRATORY TIMEā from Cerner to CMD v5 using SNOMED, I expected to find SNOMED code 250819002 as a mappable option. This entry is in the concept.csv file that I loaded into the Lucene index (4353947,Inspiratory time,Observation,SNOMED,Observable Entity, S, 250819002,19700101,20991231,). However, I cannot seem to find a way to get this value as a mappable option through any combination of Search or Filters.
The first is easy: once youāve select all matches your want to approve, you can go to Edit --> Approve selected and all selected items will be approved. I guess we need to add that to the Wiki
The issue is harder: Iām unable to reproduce this. If I type in the manual query āinspiratory timeā that SNOMED concept is the first that pops up. Have you unchecked all filters? Can you tell me which version of the CSV files you used? Iām on Vocabulary5.0-20141013
Iām using the same Vocab version. Iām going to try to reload the index.
Bill
I agree keeping the text from the previous search is probably not what you want, but Iām not sure if we should start with the source term. Let me think about it.
I recommend you use the āReplace conceptā button instead, which does do exactly what you want.
We did this deliberately, since sometimes thereās no avoiding mapping to multiple concepts. We see for example source codes like āDisease A and Bā, whereas the target vocabulary only has concepts for A and B separately. That being said, this feature should be used only as an absolute last resort. I think I will add a warning popup if somebody approves a code that maps to more than on concept, and will make sure to mention this in the Wiki as well.
Another issue:
it appears that there may be a character set issue. The Results pane is having issues displaying some characters, but only in the āConcept nameā column:
Bill
The Vocab files are using ISO-8859-1, but I was expecting UTF-8. In the next release of Usagi Iāll use ISO-8859-1.
Thanks!
Does OHDSI have a character set standard? I also assumed UTF-8 as the default because we always use that.
Bill
Martijn,
Iām working through a mapping from ICD9 to Snomed conditions. Iām seeing quite a bit of issues with incorrect matching between the terms with and without. Iāve hit this about 50 times on this run through 1600 ICD9 terms.
input: āOpen wound of face, unspecified site, without mention of complicationā
match (0.82 score): āOpen wound of face with complicationā
should be (0.75 score): āOpen wound of face without complicationā
thoughts?
Bill
Hi Bill,
I suspect you are already aware of this, but in case not: You can get complete ICD-9 to SNOMED CT mappings from:
Brandon
Brandon,
I have the NLM mapping pulled up while running through this process with Usagi.
Iām not familiar with the IHTSDO mapping and havenāt found it yet (that site loves their PDFsā¦)
Bill