Your process is very similar to ours. We load all data over all years into a database, then our CDM_BUILDER chunks up the people and runs a person at a time. We can then distribute the work across machines.
We do not rebuild the SOURCE each time, we just append our new data to the end of it and reprocess indexes/statistics and such. The CDM is completely rebuilt each time.
Actually, giving it a little thought, you might be able to do it by year. It is more work and will increase your run time (you’ll have a lot of post processing that wouldn’t have to be done running off the full DB).
-
You would have the problem that OBSERVATION_PERIODS would be year bound, but maybe you just create a job at the end that stitches them all together at the end.
-
ERAS would be fine, you would just run them after all the years are done running.
-
For the PERSON data we take the last person’s data - so you could have a situation that a person changed gender/birth year - but you could write each year’s information out and then post processing collapse the data.
-
For collapsing adjustment claims, you could just do the adjustment in the year and just accept you might be missing opportunities to collapse some records. Truven does a fairly good job of cleaning up the data, it isn’t like collapsing is a major component anyway.
I’ll noodle on it a bit more and see if I can come up with a reason why it would be a bad idea. Also, I am assuming you’ll have to do a full CDM build each time, you’ll need to get all data on the same VOCAB. I would not recommend appending to a CDM due to needing to keep up with the VOCAB.