OHDSI Home | Forums | Wiki | Github

Synthetic and de-identified data

Group: Synthetic data has helped us to further the mission of OHDSI. Most of our work is related to CMSsynpuf - but are there others? Could we use this thread to create such a list?

Fake or synthetic data

  1. CMSSynpuf
  2. Data Entrepreneur Clinical Observables Yardstick
  3. PseudoVet
  4. Original OMOP OSIM 1 and OSIM 2
  5. Harvest from CHOP
  6. OpenMRS has limited data
  7. iPDG

There are de-identified but individual level data such as:

  1. MIMIC

Anything else


There is also a paper about producing synthetic patients and synthetic EHR based on HL7-FHIR

1 Like

@SCYou We just starting using SyntheticMass as well! Great addition to this list.

1 Like

@krfeeney Cool! It would be great topic for OHDSI infrastructure!

@kausarm will be presenting on our release of a CDM v5 compatible version of OSIM. Join the community call April 17th for details.

@jon_duke @kausarm Great! Thank you!!

I spoke with Jason Walonoski from the Synthea team 2 weeks ago and we’ve agreed to collaborate on ways to integrate Synthea into the OHDSI architecture. I have been able to use Synthea to simulate CSV output and will be looking to create a converter to CDM v5.

1 Like

Love Synthea (thank you devs!) but we found it pretty challenging to define new populations in our experience from trying to port SyntheticMass to another topic. @Frank Did you create new rules or mostly work with it out of the box?

I was working with it out of the box but Jason did spend some time showing me their visual designer for new population modules. I hope to get more involved with that aspect after we finish the converter.

I would really like to add more realistic synthetic data to Eunomia. Does anyone have synthetic datasets they would be willing to share with community?

It’s not easy to develop analytic code for OMOP CDM without realistic data.