OHDSI Home | Forums | Wiki | Github

ETL from Unmapped Sources

From a practical point of view - financially and opportunistically I have also been taking @mgkahn approach. Typically there are a number of locally funded projects that can be served effectively by _source_concept_id first, _concept_id mapping later. I think it is a long term objective to move everyone’s thinking toward the OMOP model to support/enable multi-site studies. But until there is significant financial pull - I for one will be primarily driven by local needs first. Another pragmatic issue for us is we want to be able to conduct cohort discovery utilizing detailed medical device observations and measurements - which for some studies is much more accurate than utilizing summarized (data reduction) OMOP standard vocabularies - particularly when existing predictive analytics are involved. This means creating local medical device vocabularies first (utilizing measurement_source_concept_id first), getting the data into OMOP database and using it. Then work on proposing a new vocabulary to the OHDSI community - which I still need to find the time to do. If I had the financial backing and time - I’d go with Christian’s model.

Friends:

This issue is now boiling over. Folks have trouble with their source data with no way to organize non-standard vocabularies, incorporate them as concepts and map them to standard ones.

We said we’d come up with a proposal to fix this problem. Here is one:

We want to build a website for anybody to manage their data, and essentially do what the vocabulary team does on a large scale for the Standardized Vocabularies. Something like:

  •   Upload of spreadsheets, with or without concept_codes, including deprecations
    
  •   Manage the vocabulary_id
    
  •   If necessary, auto-create the concept_code or recognize you already have one
    
  •   Auto-create the concept_id or recognize you already have one
    
  •   Upload mapping (or later create mappings through an integrated USAGI) to standard concepts
    
  •   Maintain rules (can’t map to deprecated code, etc.) 
    

Once you have all this, you export your new concepts together with the Standardized Vocabularies from ATHENA.

Thoughts? @mgkahn? @MPhilofsky? @mkwong? @esholle? @daniellameeker? @Mark_Danese?

C

2 Likes

And we need a Greek name for that thing. :smile:

Hi, I think this would be a good start to post and share work that gives the vocab team easy access to work in progress for adoption too. Thanks

Anything that makes mapping easier has to be a good idea. Ptolemy is an obvious choice but any greek cartographer would do.

1 Like

@Christian_Reich,

Awesome! @mgkahn and I love the idea! We have a couple of questions:

  1. Will the tool support the creation of local “standard” concept ids > 2000000000?
  2. Will the tool support the creation of local “classification” concept ids > 2000000000?
  3. Will the tool provide an export file suitable for upload to all the Standardized Vocabulary tables?

@MPhilofsky:

  1. That’s the idea. But the tool will probably put pressure on you to make them non-standard, and map them, using a built-in USAGI.
  2. Same.
  3. Yes. The idea is that when you download from Athena your dogfood will be included in the zip file. But nobody else will get it.

And then we need some way to share them, because all these silly lab test names are probably often repeated across the institutions.

Hi ,i want to learn something about the Drug domain.I always get some unpredictable results when i map my concepts.Is there a way to have Usagi match source code string to concept code string? Instead of semantically similar terms? My procedure source terms are producing high match scores to similar procedures, but not the exact CPT source procedure. Going line by line to pick out the exact match would take forever.

Hi,

In this case you don’t need Usagi, you just find the concept with the minimal levenshtein distance.
But even this will not work, because, let’s say “Aspirin 50 MG Oral Tablet” and “Aspirin 10 MG Oral Tablet” are very close, but different concepts, while
“Aspirin 50 MG Oral Tablet” and “Aspirin 50 MG Oral coated tablet” represent the same drug but have bigger difference than the example above.

We built the algorithm, extracting the logical attributes that make the Drug concept:
Ingredient, Dose Form, Dosage, Brand Name, Quantity, Supplier. Then we map them separately to the standard Attributes.
This way we get the accurate Drug mapping.

Thank you very much!I am so excited for your replaying! But I am sorry that i still have some problem to solve. If i map them separately to the standard Attributes,whether i only need one of them to be the procedure source terms every time and i still need use Usagi to do it or not?

Well, the main point in the mapping process is to define those attributes,
and then you just use the simple name matching using concept_synonym_names.
sometimes, yes, you really need some fuzzy matching algorithms, i.e. Usagi, or semantic analysis, but most of the cases will match by the name equality.
for example
Aspirin 50 MG Oral Tablet [Halfprin]
Aspirin ->1112807 Aspirin
Oral Tablet -> 19082573 Oral Tablet
Halfprin -> 19068001 Halfprin

Thank you again! Does Usagi also use this algorithm like you speak? My source term is chinese,Do you know the difference between them?

Oh,i forget to say my Concept name is also Chinese.

no, Usage doesn’t do that.

Of course,you need to translate your concepts into Englsih

here is an explanation of how to create the tables
http://www.ohdsi.org/web/wiki/doku.php?id=documentation:international_drugs

here is a script that bouils these tables together matching the concepts

hi,i still can’t understand about why drugs mapping can’t be right enough.Can you tell me the reason in detail? I am chinese so my drugs is also in chinese ,and I touch in this field just for one month.Thank you very much!

You can basically map your drugs to RxNorm/RxNorm Extension in any way that you like. You may use Usagi, but it’s usually pretty biased when it comes to drugs. You also need to translate the names of the drugs first to use it. The second option is to map them manually (which is quite time-consuming and you’ll have to do it over and over again when the next refresh of your vocabulary comes). Or you may use the standard approach with scripts that will find the standard counterparts for your drugs. This approach requires the creation of intermediate tables where everything needs to be in English though. Bonuses: you’ll get a reproducible and more or less automated way; the result will be more reliable;you can put the vocab into the OMOP vocabularies set so it wil be available on Athena and will capture all the details from your source vocabulary ( like Chinese brand names or manufacturers, even if they don’t exist in current RxNorm).

1 Like
t