OHDSI Home | Forums | Wiki | Github

Loading synpuf data

Hello, my name is Frank DiMartini and I would like to contribute to the OHDSI community. A little background first, I am currently managing an ETL development team at IQVIA for almost a year now, with a focus on performing OMOP conversions. I have an extensive ETL background with some programming, and have been looking for ways that my skill set can contribute to the OHDSI community.

I am curious to see if there is a benefit to have scripts that load synpuf data by environment. So far, I see that there is a github page that has the DDL for a few environments (https://github.com/OHDSI/CommonDataModel) and I also see a post about loading the data via Postgres (https://github.com/mustafaascha/ubuntu-synpuf-cdm), but what I don’t see is a way someone can load synpuf data to other environments without having to create their own scripts.

So my question is, does having scripts to load the synpuf data (OMOP or source) to a variety of environments provide value to the community? If it does, are Redshift, Hadoop and SQL Server a good start? I’ll admit, those are the environments I can start with for now, but can expand later. Any feedback you can provide is greatly appreciated.

2 Likes

I think this is a great idea, Frank :slight_smile:

Hi Frank, when we developed the scripts, we only had a PostgreSQL environment. However, we borrowed from other repositories that maintain the current OMOP CDM, these scripts are not specific to SynPUF, and they gradually get outdated unless they are kept in sync with new OMOP releases.

A better solution would be to link to the database loading code from the OMOP repository. We posted an end to end PostgreSQL solution in the ETL-CMS repository so it could all be in one place, but a more sustainable modular approach is worth considering.

Christophe

Synthea is another option to check out, this is a new synthetic data set we have been playing with:

I successfully loaded the synpuf 2.33 million dataset to our dev environment, and some reports work, but Condition Occurance, Condition Era, Procedure, Drug Exposure, Drug Era, Measurement, and Observation do not:





Is there anything specific to do, to enable these reports?

We are using this as demostrator and training environment, and it would be useful to have these reports.

t