OHDSI Home | Forums | Wiki | Github

Generate custom synthetic data from existing dataset

Is there a tool that takes an existing dataset as input and produces a statistically similar synthetic dataset?

Synthea and its Module Builder look promising, but from what I can tell, the only ways to customize the data are by adjusting the properties or providing command-line flags.

On the other hand, OSIM-v5 seems to be built around this approach. The function analyze_source_db() takes a database as input and produces Transition Probability Tables, which can be used to generate custom synthetic data. However, it uses features that were deprecated in PostgreSQL 12, and I’m not yet sure if I can feasibly update the code base.

Hi @Ashlin_Harris ,

This is not an OHDSI tool, but Replica Analytics provides a software package called Replica Synthesis that will generate very comparable synthetic data from real datasets. I work for Replica and can get you in touch with the right people, if interested.

We updated OSIM-v5 to work with modern PostgreSQL and were able to use it successfully. Our fork can be found here.

t