Frank – I’ve taken a bit of a sideways approach to getting realistic simulated data into OMOP CDM V5. The Synthea patient data simulator generates patient level data based on disease-specific state transition probability models (https://github.com/synthetichealth/synthea; Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, et al. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc). The team is constantly adding new disease states, mostly outpatient. They are now adding billing information. The program creates output in various formats – CSV, C-CDA, FHIR STU 3.0.
A CS student is wrapping up a summer project to take the FHIR STU 3.0 output generated by Synthea and create an OMOP CDM V5 Postgres database mapped to the OMOP terminology. Version 1.0 is being wrapped up for posting to the Synthea git site as a separate project (https://github.com/synthetichealth/synthea_omop). We intend to find a place in OHDSI to cross list this tool. I have a “to-do” list of additional work that needs to be done by future students. But we should have an initial version available to the community in the next few weeks when Shahab has his code cleaned up and documented.
I have a very new student just spinning up who will look at substituting the existing Synthea disease-specific transition probabilities with observed probabilities extracted from our health system’s data warehouse. The hope is that this simulated database may look more like our patients than the simulated population based on Massachusetts and national statistics that Synthea is currently using. Don’t know if this idea will pan out.