Is there a tool that takes an existing dataset as input and produces a statistically similar synthetic dataset?
Synthea and its Module Builder look promising, but from what I can tell, the only ways to customize the data are by adjusting the properties or providing command-line flags.
On the other hand, OSIM-v5 seems to be built around this approach. The function analyze_source_db()
takes a database as input and produces Transition Probability Tables, which can be used to generate custom synthetic data. However, it uses features that were deprecated in PostgreSQL 12, and I’m not yet sure if I can feasibly update the code base.