OHDSI Home | Forums | Wiki | Github

Standard data source & test cases for Atlas and OHDSI stack


(Thomas White, MD, MS, MA, CHIE) #1

We are standing up the OHDSI tools within Azure, and would like to test all of the core Atlas capabilities to ensure that everything works properly; plus to stress test the system (e.g. monitor response times) so that we properly size our environment to support our needs.

What synthetic data source(s) + JSON files does the community recommend to test these features (e.g. ensure they run properly and generate results):
• Characterization
• Cohort Pathways
• Incidence Rates
• Estimation
• Prediction

These feel like fairly standard test cases that could be made available in GitHub. Has anyone done that?

/Tom


(Gregory Klebanov) #2

Hi Tom,

At Odysseus, we do a lot of ATLAS testing internally to make sure it works across various databases. For that, we use a combination of OHDSI ATLAS tutorial examples (for simpler cases) as well as OHDSI studies posted on GitHub ( https://data.ohdsi.org - just pick a 2-3 relevant studies).

Synthetic test data - well, here we are in a process of cleaning things up. There are two synthetic datasets that can generally be used for your needs - SynPUF and Synthea. One is claims, another is simulated EHR data. The current issues:

  • SynPUF - the OMOP version of the dataset was produced a few years back, while vocabularies definitely moved on since then. Vocabs used in data during ETL should really match those installed which is often not the case here. From that perspective it is a bit of a mess. So, we are in a process of figuring out how to re-run SynPUF OMOP ETL with the latest vocabs. Another challenge, it is a simulated claims data - so you need to ensure the test cases you are using would find the data (this is why we are using tutorial examples for testing)
  • Synhea - we do have some data generated for Synthea already, just need to install it now. Again, this is a simulated EHR data which is good for certain studies that need it. Same story, you need to ensure the test cases you are using would find the data.

Feel free to ping me if you want to discuss the details.


t