OHDSI Home | Forums | Wiki | Github

Synthetic data with simulated covid outbreak


(Michael Shamberger) #1

Here is a synthetic omop csv data set that contains a covid pandemic. I started with synthea covid branch and generated a dataset for MA, USA. This was converted to omop using ETL-Synthea project. The pandemic starts Jan1.2020 and affects all 10K patients on a standard distribution over a period of 3 months.

Let me know if any particular need for synthetic data. I am building azure pipelines to generate it on a schedule.

https://dev.azure.com/shambergerm/Covid19/_git/covid19Storage?path=%2Fomop%2FMassachusetts_covid19_omop_531.zip

You should download vocab separately from athena. Vocab date for this dataset was 29.3.2020 and includes latest covid codes.

synthea covid19 branch and modules:

Here is example visual simulation of similar dataset generated for Finland using cdm 6.0 that has the geo capability since lat/lon on location:


(Andrew S. Kanter, MD MPH FACMI FAMIA) #2

Does this include more specific COVID-19 diagnoses and tests coded with appropriate SNOMED, ICD-10 and LOINC codes? What about SARS-CoV-2 positive patients with other manifestations (pneumonia, ARDS, etc.) that require multiple ICD-10 and/or SNOMED codes per diagnosis? Thanks!


(Michael Shamberger) #3

It contains specific COVID-19 diagnosis and tests coded with SNOMED and LOINC. Synthea does not use ICD10 as it requires payment.

Survivor, non survivor lab values based on Figure 2 from https://doi.org/10.1016/S0140-6736(20)30566-3
“Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study”

Here are some examples:
SNOMED:
49727002, Cough (finding)
386661006, Fever (finding)
267036007, Dyspnea (finding)
233604007,Pneumonia (disorder)
840544004,Suspected COVID-19
840539006,COVID-19

LOINC
89577-1
89579-7

Synthea has these modules for covid19 simulation:

Risk determination

Infection sequence

Non survivor lab values

Survivor lab values


(Thomas White, MD, MS, MA, CHIE) #4

Michael, are you still creating COVID synthetic data? We’re standing up our OHDSI infrastructure on Azure, and want to start testing the environment, and sizing our needs, based upon a population size of about 2 million patients.


t