Synthetic dataset

shohreh · July 13, 2021, 2:00pm

Recently, I have read several papers regarding generating synthetic data in health care and found out that one of the most used methods is the Bayesian Network. In order to generate a dataset on Type2 Diabetes patients that would strength fully represent the real population, do you have any other suggestions except the BN technique?

David_Dorr · July 13, 2021, 3:29pm

We developed a Bayesian Network version and it can’t represent everything about a population; instead, it can be generate a specific subset of interesting data and maintain some relationships. So the purely synthetic versions of data are limited. Another mechanism people are trying is synthetic derivatives from real data - using tools like MDClone or the specifications from Acorn.ai … still has limitations, of course.

jposada · July 13, 2021, 6:22pm

Hi @shohreh ,

have you looked into this

There are even scripts to transform it to the CDM

nadavrap · July 15, 2021, 7:35am

To my best of understanding, Synthea is useful for generating synthetic data based on their own knowledge. What I’m looking for is to generate synthetic data based on my own data. How can I do that?