OHDSI Home | Forums | Wiki | Github

ETL Mechanism for CDM

I have following queries regarding ETL process of OHDSI:

  1. Is there any way to insert data in CDM database without ETL CDM Builder?
  2. What is ETL implementation process after utilizing White-Rabbit & Rabbit-in-a-Hat tools?

Please guide.

Hi Swanshi,

You can use any method you’d like for the ETL, you just need to ensure the CDM specifications and conventions are upheld when converting your source data into the CDM database.

White Rabbit can be used to profile the source data to tell you more about the frequencies of values across the tables and fields, and this report can then be fed into Rabbit in a Hat (RIAH). RIAH can then be a useful tool to graphically create mappings from the source tables into the CDM tables, as it can produce an ETL document and an R testing package for testing your ETL against dummy data.

Thanks,
Ajit

Hello Swanshi

I am an OHDSI newbie, in the process of learning OMOP. I too found ETL documentation and existing tools confusing. I have started experimenting with ETL here ( https://github.com/dermatologist/hephaestus ) using packages such as SQLalchemy and bonobo. I have just started, but it may be worth having a look if you are developing your own ETL script in Python.

Cheers,
Bell

Probably the most straightforward solution is to only use SQL to perform the ETL, as for example implemented in this ETL of Korean insurance claims data. Another example that uses Python as mentioned above is this ETL of CMS data. I’ve seen people use point-and-click ETL tools like Pentaho.

It all depends on what database platform you’re using, what technical expertise you have, size of your database (kilobytes, megabytes, or terabytes?) , etc.

t