OHDSI ETL builder on Dutch database

Daphne · January 16, 2020, 3:59pm

Dear all,

I am new to OHDSI! I followed the ‘get started steps’, but got stuck on the ETL process. I want use the ETL-CDMBuilder, but I saw that this can only be used for very specific databases. Since I have to create the ETL-CDM on a dutch database, I was wondering how I could do this.

Does anyone have any suggestions?

Many thanks in advance,

Daphne

MaximMoinat · January 16, 2020, 4:46pm

Hello Daphne,

Welcome to the OHDSI community.

The ETL process is different for each source database. Unfortunately there is not one pipeline that can transform different data sources to the OMOP CDM. There is an OHDSI tool, WhiteRabbit, that helps you to define the ETL of your data source.

The ETL-CDMBuilder is an example of an ETL implementation, implemented for a number of US (and UK) data sources. Another one, written in R, can be found here: https://github.com/ohdsi/etl-synthea.

Please also take a look at European Health Data and Evidence Network (EHDEN), we organise several data source calls to map European health records to the OMOP CDM. This links you to a SME who has OHDSI/OMOP expertise. I could bring you into contact with the right people.

mkwong · January 16, 2020, 5:51pm

Having and actively been implementing ETLs in three context - pre-hospital emergency medicine patient care records to OMOP, veterinary patients, and devices - my suggestion is to do it in steps. As many others have said on other threads to this question - there aren’t a universal ETL solution even when dealing with the same EHR system as everyone documents and customizes their EHR systems differently.

The approach I have settled on is to first populate the OMOP.person table and introduce a master_index table which provides a link between the patient’s true medical record number to the OMOP assigned person.person_id. Do one month or one year worth of data depending on your volume so it is easy to check and validate. Then move on to visits and care_sites. Again check and validate. Doing care sites then gets you into dealing with mapping and resolving variations in care site names to standard concept codes.

Then move on to drugs, then devices, then procedures, then observations and conditions. I found it also helpful to have a use case to work against as it helps focus the validation and checking as well as bring clinicians in to help with the validation and checking.

As for ETL implementation - I design and implement in Java. My data sources come from MS SQL databases, CSV format text flles, ASCII text, XML, and in all sorts of formats. The implementation choice greatly depends on your local software expertise and experience in utilizing various tools available from OHDSI and non-OHDSI.

Good luck.