ETL strategies - implementing OMOP across entire population vs. smaller projects

(prasanth n.) #1

Hello, we have implemented OMOP at Northwestern for a few projects (All of Us, eMERGE, Brain Tumor Institute, etc.), but looking to expand this for entire patient population as well.

I have some very general questions for folks who have implemented the CDM for their entire patient population opposed to a cohort of patients:

  • what is your patient population?
  • how often do you refresh your OMOP data?
  • when refresh happens, is it done full reload or do you have an incremental strategy in place?
  • how do you handle updates to the vocabulary? does a full refresh happen?
  • are there any papers out there from past symposiums that I can read up on?



(Frank D) #2

Hi Prasanth,

  1. varies - can be thousands, millions, hundreds of millions, depends on the dataset
  2. varies by dataset - generally quarterly but some monthly
  3. We do a full refresh, meaning we re-run the entire repo end-to-end starting with DDL’s through the entire conversion
  4. We use the latest vocabulary each time we refresh
  5. I did have a poster about our ETL process - having an intermediate staging table pre-cdm. I believe we discussed this in person in Chicago in July during our meet.