Looking for synthetic datasets that have been used for tool tutorials

Hello community!

I’d like to make an open call for anyone who has built any CDM synthetic datasets. I took a look at Sympuf and it appears to be modeled after a Medicare dataset, which makes it look like an older population, and has very limited years in the data set (I believe 2012-2014).

I’m interested in collaborating with anyone who has done more recent work with synthetic datasets (in CDM format) and discuss usage of these data sets for tutorial purposes. I’d like to be able to replicate one of the network studies that has been published on our network study registry. Another option is using the Eunomia dataset to replicate some of the analyses from the Book of OHDSI.

Please feel free to reach out directly if you have information on synthetic datasets that are useful for this sort of activity (tutorial, data characterization, etc).

-Chris

Hi Chris,

I have a Synthea based 1M patient dataset that clears all Achilles Checks and 88% complete in DQD checks. It is based on a diverse population in Pennsylvania.

Let me know how I can get this to you.
You can see it at parthenon.acumenus.net

-Sanjay

That’s amazing. Let me check it out at the website. Is there any restrictions on use?

No restrictions on use. Its the Acumenus OHDSI CDM. I built it myself, and its opensource, but its BIG. About 380GB.

Ooof, that might be a little heavy.

Might be managable if it’s zipped up… is that a compressed size?

Let me pgdump and compress it for you. I’ll let you know soon.