OHDSI Home | Forums | Wiki | Github

ETL file for CDM V4 -> V5.x for Truven CCAE dataset

Has a real ETL file for CDM V4 -> V5.x been published for Truven CCAE dataset in GitHub? I was searching and couldnt find it easily.

How do community members keep up with data refreshes and schema changes on an ongoing basis? What kinds of automation would be helpful to the community?

1 Like

Hi @MuraliKaundinya!

There are two things that can answer your question:

  1. There is a script that will take a CDM 4.5 build and convert it to a CDM 5.0 build. It isn’t perfect as it is better to re-purpose your builder however we did use this at Janssen for a bit until our CDM_BUILDER was updated.
    https://github.com/OHDSI/CommonDataModel/tree/v4-to-v5-conversion

  2. At Janssen R&D we have a builder for Truven and we share our code in Git. Our builder was not originally designed to support general purpose ETLs across different platforms, and modifications would be required to apply to other systems however it is a available for your reference.
    https://github.com/OHDSI/ETL-CDMBuilder
    Additionally you can review our documentation for the logic for the CCAE/MDCR CDM_BUILDER.
    https://github.com/OHDSI/ETL-CDMBuilder/blob/master/man/TRUVEN_CCAE_MDCR/Truven_CCAE_and_MDCR_ETL_CDM_V5.0.1.doc

We’ve additionally published on our Builder where we discuss applying a standard visit logic to bring consistency among our claims databases.
Voss EA, Ma Q, Ryan PB. The impact of standardizing the definition of visits on the consistency of multi-database observational health research. BMC Med Res Methodol. 2015 Mar 8;15:13. doi: 0.1186/s12874-015-0001-6. PubMed PMID: 25887092; PubMed Central PMCID: PMC4369827.

As far as keeping up with the schema changes, we find Truven isn’t too bad as far as changes. Usually the vendor gives a heads up prior to the receipt of a change and we just modify the Builder to handle it. In the past couple years there really hasn’t been big enough changes that required a lot of work on our side.

Tagging @clairblacketer as we work together on the Truven CDM_BUILDER. Let us know if you you have other questions or how we can help!

1 Like

Hi @ericaVoss !
Thanks for the information that you had shared. Question on the V4 to V5 conversion SQL that has been posted at : https://github.com/OHDSI/CommonDataModel/blob/v4-to-v5-conversion/Version4%20To%20Version5%20Conversion/Oracle/OMOP%20CDMv4%20to%20CDMv5%20-%20Oracle.sql

Question 1 :
The query that loads the g8iauhqmconcept_map table looks for relationships that satisfy a set of criterion ( Eaxmple : cr1.relationship_id IN
(‘Maps to’,
‘RxNorm replaced by’,
‘SNOMED replaced by’,
‘UCUM replaced by’,
‘Concept replaced by’,
‘ICD9P replaced by’,
‘LOINC replaced by’,
‘Concept same_as to’,
‘Concept was_a to’,
‘Concept alt_to to’)
AND cr1.invalid_reason IS NULL)
in V4 , relationship_id is an integer , so , tried matching it with RELATIONSHIP_NAME - But still could not find the exact strings given above . Found a similar strings such as “Concept replaced by (SNOMED)” - Assume that this is the same as ’ ‘SNOMED replaced by’’ - Is this the right understanding ?

Question 2: Are there any documents that have details of the logic behind the data load from V4 to V5 - apart from the SQL script available in GITHUB ?

  1. I think you would need to download a newer version of the Vocabulary (http://athena.ohdsi.org). You need a 5.0 Vocabulary.
  2. The V4 --> v5 is kind of more a patch than anything else, so there isn’t much outside the script.
1 Like

Hi @Sankari - following up on your question/comment from GitHub here.

The code snippet you pulled out with the relationship_ids is part of the “concept_map” table construction. The concept_map is used to map records in CDMv4 to the appropriate domain in the CDMv5. As @ericaVoss mentioned, this script is more of a patch so we’ve included as much guidance as possible in the script. At the top of the script, there are 2 sections: General Assumptions & Getting Started. These are intended to provide you with some guidance on how to set up your CDMs in order to run this script. Let me know if you need any clarifications.

1 Like

@MuraliKaundinya, @Sankari, friends:

Just to put my 2 cents out here in writing, because this conversation is happening with different people in different places:

I understand that investing in ETLs is unpopular, particular if the version drift forces you to go back. But I would NOT do the upgrading, unless it’s a really temporary stopgap solution. The reason we are changing the CDM is to fix problems and to enable new important use cases. These upgrade scripts by their nature cannot go along with these.

The CDM, even though it requires an upfront investment, is the foundation for a number of things:

  • The ability to do division of labor in the community: Different people with different skillsets and assets can collaborate in solutions, making OHDSI the strong community it is now.
  • The ability to do true Remote and Federated Studies, which means we can blindly send queries and analysis code to data partner where we never had and probably never will have direct data access to, and still get correct results.
  • A true scale. We already have above 1 Billlion patients we can study. This alone will make answering of many questions feasible which now are impossible to tackle.

The community will help you. And if you don’t have the capacity, technology or skillset to do your own ETLing there are a bunch of folks who will gladly do it for you if you can find a way to compensate them.

t