OHDSI Home | Forums | Wiki | Github

Data loading using ETL

I am planning to load the claims data to the MySQL CDM database . I would like to know shall we load the data using the insert into … select * …from table , or is there any open source ETL tool to assign the concept id for the column before loading it to CDM database based on the text search of the concept name or do we need to write the customize script to check for concept id based on the text search . Thanks .

If you have not already watched the OHDSI tutorials on ETL, you should go to YouTube and search for ‘OHDSI 2019 ETL’. This will give you an idea of the options.

While MySQL isn’t one of the officially supported (ex ATLAS) databases, I have implemented my OMOP on MySQL 8.x for our veterinary school and for an emergency medical service (EMS) dataset. Both first defined a source “terms” to OMOP concept map in the form of a Java Properties file - name=value sort of format. The ETL is also relatively simple Java program that reads the input text and uses the OMOP map to get the concept_id. If the source text is not mapped, it gets saved and tagged in the note table so I can review and map it later.

Thanks Manlik , so you mean we need to write our customized code (SQL scripts or Java code ) to implement the ETL for CDM database .
Also has Achilles ever worked for the MYSQL database ,although it is never mentioned on the OHDSI book .

@abhiiitr02:

You can use MySQL to create an OMOP database no problem. And yes, you have to write your own ETL from whatever source system. Everybody has to do that, because everybody has different source data. However, MySQL doesn’t support partitions and all sorts of analytical functions necessary for the OHDSI tools and packages. So, unless all you want to do is to use the data for some manual querying you won’t get very far with MySQL. If you need a free database system use PostgresQL. It is very well supported.

Per Christian’s comment - yes, everyone has to write or buy their own ETL software since everyone has different Electronic Medical Record systems and content (local documentation practices). Meaning even if two sites have the same EMR system, one might use pick-lists while another lets the clinician enter data as free text.

I have a process that seem to work for me - breakdown the number of data sources available and prioritize based on ease of writing the ETL and value of the data to get started. The former lets you get the base software designed and implemented if you decide to write your own. Sort of getting your feet wet and getting a feel for how to work with the OMOP database and source data. Next, write a survey program to get a feel for the content itself, which leads to creating source to OMOP concept maps. Then just work through each data source and check the integrity after each step and data source.

Or find a experienced OMOP contractor to help with the implementation.

Christian and the OHDSI community has been a tremendous resource regardless of the direction and approach.

Best of luck.

t