ETL Mapping specs for CDM v5.0?

ericaVoss · January 15, 2016, 6:26pm

Your process is very similar to ours. We load all data over all years into a database, then our CDM_BUILDER chunks up the people and runs a person at a time. We can then distribute the work across machines.

We do not rebuild the SOURCE each time, we just append our new data to the end of it and reprocess indexes/statistics and such. The CDM is completely rebuilt each time.

Actually, giving it a little thought, you might be able to do it by year. It is more work and will increase your run time (you’ll have a lot of post processing that wouldn’t have to be done running off the full DB).

You would have the problem that OBSERVATION_PERIODS would be year bound, but maybe you just create a job at the end that stitches them all together at the end.
ERAS would be fine, you would just run them after all the years are done running.
For the PERSON data we take the last person’s data - so you could have a situation that a person changed gender/birth year - but you could write each year’s information out and then post processing collapse the data.
For collapsing adjustment claims, you could just do the adjustment in the year and just accept you might be missing opportunities to collapse some records. Truven does a fairly good job of cleaning up the data, it isn’t like collapsing is a major component anyway.

I’ll noodle on it a bit more and see if I can come up with a reason why it would be a bad idea. Also, I am assuming you’ll have to do a full CDM build each time, you’ll need to get all data on the same VOCAB. I would not recommend appending to a CDM due to needing to keep up with the VOCAB.