Has a real ETL file for CDM V4 -> V5.x been published for Truven CCAE dataset in GitHub? I was searching and couldnt find it easily.
How do community members keep up with data refreshes and schema changes on an ongoing basis? What kinds of automation would be helpful to the community?
There are two things that can answer your question:
There is a script that will take a CDM 4.5 build and convert it to a CDM 5.0 build. It isn’t perfect as it is better to re-purpose your builder however we did use this at Janssen for a bit until our CDM_BUILDER was updated. https://github.com/OHDSI/CommonDataModel/tree/v4-to-v5-conversion
We’ve additionally published on our Builder where we discuss applying a standard visit logic to bring consistency among our claims databases.
Voss EA, Ma Q, Ryan PB. The impact of standardizing the definition of visits on the consistency of multi-database observational health research. BMC Med Res Methodol. 2015 Mar 8;15:13. doi: 0.1186/s12874-015-0001-6. PubMed PMID: 25887092; PubMed Central PMCID: PMC4369827.
As far as keeping up with the schema changes, we find Truven isn’t too bad as far as changes. Usually the vendor gives a heads up prior to the receipt of a change and we just modify the Builder to handle it. In the past couple years there really hasn’t been big enough changes that required a lot of work on our side.
Tagging @clairblacketer as we work together on the Truven CDM_BUILDER. Let us know if you you have other questions or how we can help!
Question 1 :
The query that loads the g8iauhqmconcept_map table looks for relationships that satisfy a set of criterion ( Eaxmple : cr1.relationship_id IN
(‘Maps to’,
‘RxNorm replaced by’,
‘SNOMED replaced by’,
‘UCUM replaced by’,
‘Concept replaced by’,
‘ICD9P replaced by’,
‘LOINC replaced by’,
‘Concept same_as to’,
‘Concept was_a to’,
‘Concept alt_to to’)
AND cr1.invalid_reason IS NULL)
in V4 , relationship_id is an integer , so , tried matching it with RELATIONSHIP_NAME - But still could not find the exact strings given above . Found a similar strings such as “Concept replaced by (SNOMED)” - Assume that this is the same as ’ ‘SNOMED replaced by’’ - Is this the right understanding ?
Question 2: Are there any documents that have details of the logic behind the data load from V4 to V5 - apart from the SQL script available in GITHUB ?
The code snippet you pulled out with the relationship_ids is part of the “concept_map” table construction. The concept_map is used to map records in CDMv4 to the appropriate domain in the CDMv5. As @ericaVoss mentioned, this script is more of a patch so we’ve included as much guidance as possible in the script. At the top of the script, there are 2 sections: General Assumptions & Getting Started. These are intended to provide you with some guidance on how to set up your CDMs in order to run this script. Let me know if you need any clarifications.
Just to put my 2 cents out here in writing, because this conversation is happening with different people in different places:
I understand that investing in ETLs is unpopular, particular if the version drift forces you to go back. But I would NOT do the upgrading, unless it’s a really temporary stopgap solution. The reason we are changing the CDM is to fix problems and to enable new important use cases. These upgrade scripts by their nature cannot go along with these.
The CDM, even though it requires an upfront investment, is the foundation for a number of things:
The ability to do division of labor in the community: Different people with different skillsets and assets can collaborate in solutions, making OHDSI the strong community it is now.
The ability to do true Remote and Federated Studies, which means we can blindly send queries and analysis code to data partner where we never had and probably never will have direct data access to, and still get correct results.
A true scale. We already have above 1 Billlion patients we can study. This alone will make answering of many questions feasible which now are impossible to tackle.
The community will help you. And if you don’t have the capacity, technology or skillset to do your own ETLing there are a bunch of folks who will gladly do it for you if you can find a way to compensate them.