OHDSI Home | Forums | Wiki | Github

1K sample of simulated CMS SynPUF data in CDMV5 format available for download

Friends:

Yeah. It’s not a good place for this. Nobody will find it there. The page is about the CDM in general, not about a specific database. We should have a little page with links to commercial help. Let me think about it.

Hy guys!

I had trouble to import the 1k sample of CMS SynPUF data in CDM Version 5.3. So I had to alter some tables that the .csv files matches with the database. I created some SQL scripts to easily import the 1k sample data into the database.

@lee_evans: I can provide the files, so that people can easily import the sample data into the new CDM Version 5.3

2 Likes

Wow- this is so great @lee_evans! I was unable to download the files from your website- are they still available? Or, @a_cse, would you mind kindly sharing the files with me?

Thank you very much in advance.
Ben

@bglicksb I’m currently updating the 1k sample to use a newer v5.2 CMS SynPUF data set provided by @ericaVoss

It should be available in the next few days. I’ll post the link here when it’s ready.

@a_cse Would you be able to run your scripts on the updated 1k sample when it’s available? I can then upload the v5.3 version too.

Thanks!

1 Like

Thank you very much @lee_evans! I actually was able to set it up using the files from the OHDSI ftp site, but will keep an eye out for this as well. Thanks again!

@lee_evans Sure! Hit me up when the new sample data is available

@bglicksb @a_cse @Sigfried_Gold The synpuf 1000 person sample dataset in CDM V5.2.2 format is now available for download here: http://www.ltscomputingllc.com/downloads/

Thanks @Christophe_Lambert & @ericaVoss (and others in the OHDSI community) for performing the data conversion and providing a copy of the synpuf converted data.

There is a README.md file within the zip file with additional information.

2 Likes

Thank you for your effort in making this sample dataset. It is really helpful for someone like me who is new to OHDSI. I still have a few questions about setting up the cdm schema and wonder if you or someone else could help me out.

I followed the link in the readme (https://github.com/OHDSI/CommonDataModel/tree/v5.2.2) to create an empty cdm schema:

  • execute the script OMOP CDM ddl - PostgreSQL.sql to create the tables and fields, which generates 37 tables under cdm schema

  • load the data into the schema using the 18 csv files in the sample dataset

  • then I ran the script OMOP CDM constraints - PostgreSQL.sql to add the constraints. However, I encountered a lot of errors when running this script, for example, here is one of the error:

    ERROR: insert or update on table “person” violates foreign key constraint “fpk_person_gender_concept”
    DETAIL: Key (gender_concept_id)=(8507) is not present in table “concept”.

  • I think this is caused by the concept table which is currently empty. And I found that in the link mentioned above, there is an extra step to load CMD vocabulary, which seems to populate the concept table.

So, my question is where can I find the CMD vocabulary csv files?

Thank you for the help!

I think that’s a script bug: gender_concept_id is a column in the ‘person’ table not the ‘concept’ table. Could you post here the actual statement that creates the foreign key constraint?

Thank you for your reply @Chris_Knoll . The actual statement is:

ALTER TABLE person ADD CONSTRAINT fpk_person_gender_concept FOREIGN KEY (gender_concept_id) REFERENCES concept (concept_id);

It comes from this file: https://github.com/OHDSI/CommonDataModel/blob/v5.2.2/PostgreSQL/OMOP%20CDM%20constraints%20-%20PostgreSQL.sql

Ok, I’m sorry, I misread the error: it’s explaining that the value ‘8507’ in the gender_concept_id column of Person is not found in the table ‘concept’. That’s actually a really good message :slight_smile: even tho I misread it.

It means that your vocabuary tables aren’t populated, you need to load them (via download from Athena) before applying those key constraints. Then you’ll have all the necessary concepts in the ‘concept’ table, and the foreign keys should work.

Where there instructions in the synpuf guide that instructed how you load the vocabulary tables?

@Chris_Knoll I couldn’t find direct instructions for loading the vocabulary tables. But I found a script to load vocabulary tables in this link: GitHub - OHDSI/CommonDataModel at v5.2.2 → PostgreSQL → VocabImport.

It looks like we need the following csv files:
DRUG_STRENGTH.csv
CONCEPT.csv
CONCEPT_RELATIONSHIP.csv
CONCEPT_ANCESTOR.csv
CONCEPT_SYNONYM.csv
VOCABULARY.csv
RELATIONSHIP.csv
CONCEPT_CLASS.csv
DOMAIN.csv

I looked at Athena, and was overwhelmed by the large amount of data on it :sweat_smile:. Could you please tell me which vocabulary data should I download? I only need some sample data to get ATLAS up and running.

Thank you very much for your help! I really appreciate it.

I need to tag @lee_evans, @ericaVoss or @gregk as they have more experience than I do about setting up the vocabulary.

Sorry to pass the buck!

@lee_evans shouldn’t the OMOP Vocabulary needed be included in the export of Synpuf?

1 Like

I think you’re right @ericaVoss, I don’t know why i didn’t think of that: if you build a CDM, you should have the vocabulary attached to it. If you load some other version of the vocabulary into the CDM, you could get some strange results. So the SynPUF should probably have those vocabs pre-loaded.

Yup! The only difference in what is in the Vocab is we only share the free vocabularies, or the ones you don’t need a license for. But that is plenty fine for Synpuf.

Hi @lee_evans,
I recently decided to switch over to CDM version 5.3.1. Unfortunately, i have not managed to find a SynPUF sample for this version. Is there a data set available or should I ask @a_cse for the mentioned script?
Thanks in advance
Mirko

@Mirko I’m not aware of a publicly available later version

The Synthea ETL package provides a good sample in V5.3.1 and V6:

For V5.3.1: https://github.com/OHDSI/ETL-Synthea/tree/v5.3.1
For V6: https://github.com/OHDSI/ETL-Synthea/tree/master

@Ajit_Londhe Thanks a lot. I will have a look at it.

t