OHDSI Home | Forums | Wiki | Github

1K sample of simulated CMS SynPUF data in CDMV5 format available for download

(Lee Evans) #17

@bglicksb I’m currently updating the 1k sample to use a newer v5.2 CMS SynPUF data set provided by @ericaVoss

It should be available in the next few days. I’ll post the link here when it’s ready.

@a_cse Would you be able to run your scripts on the updated 1k sample when it’s available? I can then upload the v5.3 version too.


(Ben Glicksberg) #18

Thank you very much @lee_evans! I actually was able to set it up using the files from the OHDSI ftp site, but will keep an eye out for this as well. Thanks again!


@lee_evans Sure! Hit me up when the new sample data is available

(Lee Evans) #20

@bglicksb @a_cse @Sigfried_Gold The synpuf 1000 person sample dataset in CDM V5.2.2 format is now available for download here: http://www.ltscomputingllc.com/downloads/

Thanks @Christophe_Lambert & @ericaVoss (and others in the OHDSI community) for performing the data conversion and providing a copy of the synpuf converted data.

There is a README.md file within the zip file with additional information.

(Karen Li) #21

Thank you for your effort in making this sample dataset. It is really helpful for someone like me who is new to OHDSI. I still have a few questions about setting up the cdm schema and wonder if you or someone else could help me out.

I followed the link in the readme (https://github.com/OHDSI/CommonDataModel/tree/v5.2.2) to create an empty cdm schema:

  • execute the script OMOP CDM ddl - PostgreSQL.sql to create the tables and fields, which generates 37 tables under cdm schema

  • load the data into the schema using the 18 csv files in the sample dataset

  • then I ran the script OMOP CDM constraints - PostgreSQL.sql to add the constraints. However, I encountered a lot of errors when running this script, for example, here is one of the error:

    ERROR: insert or update on table “person” violates foreign key constraint “fpk_person_gender_concept”
    DETAIL: Key (gender_concept_id)=(8507) is not present in table “concept”.

  • I think this is caused by the concept table which is currently empty. And I found that in the link mentioned above, there is an extra step to load CMD vocabulary, which seems to populate the concept table.

So, my question is where can I find the CMD vocabulary csv files?

Thank you for the help!

(Chris Knoll) #22

I think that’s a script bug: gender_concept_id is a column in the ‘person’ table not the ‘concept’ table. Could you post here the actual statement that creates the foreign key constraint?

(Karen Li) #23

Thank you for your reply @Chris_Knoll . The actual statement is:

ALTER TABLE person ADD CONSTRAINT fpk_person_gender_concept FOREIGN KEY (gender_concept_id) REFERENCES concept (concept_id);

It comes from this file: https://github.com/OHDSI/CommonDataModel/blob/v5.2.2/PostgreSQL/OMOP%20CDM%20constraints%20-%20PostgreSQL.sql

(Chris Knoll) #24

Ok, I’m sorry, I misread the error: it’s explaining that the value ‘8507’ in the gender_concept_id column of Person is not found in the table ‘concept’. That’s actually a really good message :slight_smile: even tho I misread it.

It means that your vocabuary tables aren’t populated, you need to load them (via download from Athena) before applying those key constraints. Then you’ll have all the necessary concepts in the ‘concept’ table, and the foreign keys should work.

Where there instructions in the synpuf guide that instructed how you load the vocabulary tables?

(Karen Li) #25

@Chris_Knoll I couldn’t find direct instructions for loading the vocabulary tables. But I found a script to load vocabulary tables in this link: https://github.com/OHDSI/CommonDataModel/tree/v5.2.2 -> PostgreSQL -> VocabImport.

It looks like we need the following csv files:

I looked at Athena, and was overwhelmed by the large amount of data on it :sweat_smile:. Could you please tell me which vocabulary data should I download? I only need some sample data to get ATLAS up and running.

Thank you very much for your help! I really appreciate it.

(Chris Knoll) #26

I need to tag @lee_evans, @ericaVoss or @gregk as they have more experience than I do about setting up the vocabulary.

Sorry to pass the buck!

(Erica Voss) #27

@lee_evans shouldn’t the OMOP Vocabulary needed be included in the export of Synpuf?

(Chris Knoll) #28

I think you’re right @ericaVoss, I don’t know why i didn’t think of that: if you build a CDM, you should have the vocabulary attached to it. If you load some other version of the vocabulary into the CDM, you could get some strange results. So the SynPUF should probably have those vocabs pre-loaded.

(Erica Voss) #29

Yup! The only difference in what is in the Vocab is we only share the free vocabularies, or the ones you don’t need a license for. But that is plenty fine for Synpuf.


Hi @lee_evans,
I recently decided to switch over to CDM version 5.3.1. Unfortunately, i have not managed to find a SynPUF sample for this version. Is there a data set available or should I ask @a_cse for the mentioned script?
Thanks in advance

(Lee Evans) #31

@Mirko I’m not aware of a publicly available later version

(Ajit Londhe) #32

The Synthea ETL package provides a good sample in V5.3.1 and V6:

For V5.3.1: https://github.com/OHDSI/ETL-Synthea/tree/v5.3.1
For V6: https://github.com/OHDSI/ETL-Synthea/tree/master


@Ajit_Londhe Thanks a lot. I will have a look at it.


Hi there,

here is a SynPUF data sample in version 5.3.1 without guarantee of correctness…

(Ty) #35

Thanks @Mirko! It’s really appreciated that you provided these files :slight_smile: I can confirm that they load successfully in a postgres database and the indexes & constraints for OMOP v5.3.1 run without error. I will comment back here if I find anything wrong but equally if anyone wants the SQL code (slightly modified version of https://github.com/OHDSI/CommonDataModel/tree/v5.3.1/PostgreSQL) to import @Mirko’s Synpuf files and the standard vocabulary csvs into postgres, feel free to @ me here.

(Mark Abajian) #36

Hello, @tystan, following up on your post from April 8th… I would be very interested in your SQL Server scripts to load the SynPUF files and vocabularies into the CDM 5.3.1. Please, could you share those?

Also, many thanks to @Mirko and @tystan for pursuing the SynPUF 1K files for CDM 5.3.1. Looking forward to using this.