Next year the 10 years of all the Korean (51M) claim data from both all the clinics, hospitals and pharmacies is expected to be converted into CDM v5.0.
Can the ETL and all the analytic systems handle this amount of data in timed manner? Do we need a special strategy and machines for it?
Hi @rwpark, there are many data sources that have been successfully
converted to the OMOP CDM v5 that are as big or even larger:
http://www.ohdsi.org/web/wiki/doku.php?id=resources:data_network. Within
our group, our largest database is Truven MarketScan CCAE, which has ~120m
patients over ~15 years. We store the raw data in a relational database
(in our case, MSSQL) and we found the most efficient way to develop our ETL
involved writing a C# app that allowed us to do multi-threading across
operations that could be executed on a per-patient basis, rather than
trying to do everything a full set theory across the whole population in
SQL. Our hardware for executing the ETL isn’t anything particularly
special, but we do have the box dedicated to ETL execution when we are
running it for efficiency purposes. Our ETL documentation and source code
is here, in case it’s useful: https://github.com/OHDSI/ETL-CDMBuilder.
Thank you @Patrick_Ryan!
In the link you posted: http://www.ohdsi.org/web/wiki/doku.php?id=resources:data_network
I found that they are already converted into CDM v5.
However the ‘ETL-CDMBuilder’ is for CDM v4.
Did you developed new ‘ETL-CDMBuilder’ for CDM v5 or took other strategy?
We want to convert the Korean claim data into CDM v5.
One thing noble in our Korean claim data is that it has yearly health exam data for all adults age more than 45 years including various laboratory test results.
Yes, we’ve also built our ETLs for CDMv5…really, the difference between
CDMv4 and CDMv5 is fairly small, so the logic in the ETLbuilder didn’t
change very much. I’ll ask the team to make sure to get that also out
their on github and ping you when they post it.
Yay!!! Thank you very much @Patrick_Ryan!
I found that the ETLbuilder fits on the limited raw DBs listed. Thus we will convert our DB’s structure into that one of the DBs listed, and will apply the ETLbuilder on it.