OHDSI Home | Forums | Wiki | Github

ETL value conversion best practices


(Ben Imlay) #1

Forewarning: this does not pertain to mapping data to the CDM. Rather I am mapping source data about medical devices to a new target format.

I know generally which source fields match with which target fields, but I think I need a set of rules to convert source values to target values. I envision either writing rules as columns in a spreadsheet (input/output) or as if/then statements in plain text. Once I have those rules, I will have SQL queries process the actual conversion.

Would either of those be the best practice for this stage of ETL? The OHDSI manual mentions that the ETL is split into a planning/design phase and then a technical execution phase. Where does explicitly matching source values to target values fit into the ETL process?


(Frank D) #2

Hi Ben,

Generally speaking, any of those rules will work, it just varies on the level of effort and maintenance that works best for you. Having said that, I personally prefer doing this all of this processing (let’s call it pre-processing) before doing anything CDM related. I also prefer to have everything in a reference table to join to vs case statements, as I have found it is much easier to maintain (1 spot to change code vs potentially multiple spots) and is much cleaner and easier to interpret. I suspect you may have additional pre-processing steps for formatting/data typing, denormalizing, etc. before you are ready to start putting your data into the CDM as well.

As far as where this fits in the ETL process, it depends on your ETL architecture. We tend to do all of the mappings to source/standard target before writing the code to insert into a CDM table. So, in a nutshell, you will have multiple technical steps before you get into populating your CDM.


(Ben Imlay) #3

Thanks for your response! Where possible I will use reference tables. I agree that they are easier to maintain as changes are made in the future.


t