We have done QC on ETL processes done by others. The main finding from that project was that there was a lot of variation in the ETL when the CDM guidelines were anything less than perfectly specified. There was variation in the vocabulary version used. Variation in the definition of “visit”. Variation in how codes were moved and in the use of concept_id == 0. So, part of it has to come from OHDSI itself. There is already another post on the forum about the 86 page guide that PEDSnet had to put together to ensure consistent ETL. Standardizing weight data
Our approach has been to build a “rabbit in a hat” type of design specification and then automated software that will implement it. In this way, we QC the software, and develop/QC the ETL spec. To accomplish this, we moved to an intermediate data model that we call a “generalized data model”. This focuses on the re-arrangement of data and provenance, and avoids visits and most vocabulary mappings. From this point, we have a second ETL for the specific data model. That is how we are doing it, but it doesn’t have to be that way.
The main point is that
- OHDSI (perhaps in collaboration with ETL vendors) needs to provide detailed implementation guidance before anyone can certify anything
- the process needs to be automated to generate evidence of proper testing and QC