OHDSI Home | Forums | Wiki | Github

Synthetic dataset

Recently, I have read several papers regarding generating synthetic data in health care and found out that one of the most used methods is the Bayesian Network. In order to generate a dataset on Type2 Diabetes patients that would strength fully represent the real population, do you have any other suggestions except the BN technique?

We developed a Bayesian Network version and it can’t represent everything about a population; instead, it can be generate a specific subset of interesting data and maintain some relationships. So the purely synthetic versions of data are limited. Another mechanism people are trying is synthetic derivatives from real data - using tools like MDClone or the specifications from Acorn.ai … still has limitations, of course.

Hi @shohreh ,

have you looked into this

There are even scripts to transform it to the CDM

To my best of understanding, Synthea is useful for generating synthetic data based on their own knowledge. What I’m looking for is to generate synthetic data based on my own data. How can I do that?

t