OHDSI Home | Forums | Wiki | Github

1K sample of simulated CMS SynPUF data in CDMV5 format available for download

Thank you for your effort in making this sample dataset. It is really helpful for someone like me who is new to OHDSI. I still have a few questions about setting up the cdm schema and wonder if you or someone else could help me out.

I followed the link in the readme (https://github.com/OHDSI/CommonDataModel/tree/v5.2.2) to create an empty cdm schema:

  • execute the script OMOP CDM ddl - PostgreSQL.sql to create the tables and fields, which generates 37 tables under cdm schema

  • load the data into the schema using the 18 csv files in the sample dataset

  • then I ran the script OMOP CDM constraints - PostgreSQL.sql to add the constraints. However, I encountered a lot of errors when running this script, for example, here is one of the error:

    ERROR: insert or update on table “person” violates foreign key constraint “fpk_person_gender_concept”
    DETAIL: Key (gender_concept_id)=(8507) is not present in table “concept”.

  • I think this is caused by the concept table which is currently empty. And I found that in the link mentioned above, there is an extra step to load CMD vocabulary, which seems to populate the concept table.

So, my question is where can I find the CMD vocabulary csv files?

Thank you for the help!

I think that’s a script bug: gender_concept_id is a column in the ‘person’ table not the ‘concept’ table. Could you post here the actual statement that creates the foreign key constraint?

Thank you for your reply @Chris_Knoll . The actual statement is:

ALTER TABLE person ADD CONSTRAINT fpk_person_gender_concept FOREIGN KEY (gender_concept_id) REFERENCES concept (concept_id);

It comes from this file: https://github.com/OHDSI/CommonDataModel/blob/v5.2.2/PostgreSQL/OMOP%20CDM%20constraints%20-%20PostgreSQL.sql

Ok, I’m sorry, I misread the error: it’s explaining that the value ‘8507’ in the gender_concept_id column of Person is not found in the table ‘concept’. That’s actually a really good message :slight_smile: even tho I misread it.

It means that your vocabuary tables aren’t populated, you need to load them (via download from Athena) before applying those key constraints. Then you’ll have all the necessary concepts in the ‘concept’ table, and the foreign keys should work.

Where there instructions in the synpuf guide that instructed how you load the vocabulary tables?

@Chris_Knoll I couldn’t find direct instructions for loading the vocabulary tables. But I found a script to load vocabulary tables in this link: GitHub - OHDSI/CommonDataModel at v5.2.2 → PostgreSQL → VocabImport.

It looks like we need the following csv files:
DRUG_STRENGTH.csv
CONCEPT.csv
CONCEPT_RELATIONSHIP.csv
CONCEPT_ANCESTOR.csv
CONCEPT_SYNONYM.csv
VOCABULARY.csv
RELATIONSHIP.csv
CONCEPT_CLASS.csv
DOMAIN.csv

I looked at Athena, and was overwhelmed by the large amount of data on it :sweat_smile:. Could you please tell me which vocabulary data should I download? I only need some sample data to get ATLAS up and running.

Thank you very much for your help! I really appreciate it.

I need to tag @lee_evans, @ericaVoss or @gregk as they have more experience than I do about setting up the vocabulary.

Sorry to pass the buck!

@lee_evans shouldn’t the OMOP Vocabulary needed be included in the export of Synpuf?

1 Like

I think you’re right @ericaVoss, I don’t know why i didn’t think of that: if you build a CDM, you should have the vocabulary attached to it. If you load some other version of the vocabulary into the CDM, you could get some strange results. So the SynPUF should probably have those vocabs pre-loaded.

Yup! The only difference in what is in the Vocab is we only share the free vocabularies, or the ones you don’t need a license for. But that is plenty fine for Synpuf.

Hi @lee_evans,
I recently decided to switch over to CDM version 5.3.1. Unfortunately, i have not managed to find a SynPUF sample for this version. Is there a data set available or should I ask @a_cse for the mentioned script?
Thanks in advance
Mirko

@Mirko I’m not aware of a publicly available later version

The Synthea ETL package provides a good sample in V5.3.1 and V6:

For V5.3.1: https://github.com/OHDSI/ETL-Synthea/tree/v5.3.1
For V6: https://github.com/OHDSI/ETL-Synthea/tree/master

@Ajit_Londhe Thanks a lot. I will have a look at it.

Hi there,

here is a SynPUF data sample in version 5.3.1 without guarantee of correctness…
https://caruscloud.uniklinikum-dresden.de/index.php/s/teddxwwa2JipbXH/download

3 Likes

Thanks @Mirko! It’s really appreciated that you provided these files :slight_smile: I can confirm that they load successfully in a postgres database and the indexes & constraints for OMOP v5.3.1 run without error. I will comment back here if I find anything wrong but equally if anyone wants the SQL code (slightly modified version of CommonDataModel/PostgreSQL at v5.3.1 · OHDSI/CommonDataModel · GitHub) to import @Mirko’s Synpuf files and the standard vocabulary csvs into postgres, feel free to @ me here.

Hello, @tystan, following up on your post from April 8th… I would be very interested in your SQL Server scripts to load the SynPUF files and vocabularies into the CDM 5.3.1. Please, could you share those?

Also, many thanks to @Mirko and @tystan for pursuing the SynPUF 1K files for CDM 5.3.1. Looking forward to using this.

@tystan , I am also interested in the scripts to load the files and vocabularies.

Thanks,
Yvonne

Hi @yradsmikham, I have messaged you my email so I can send the files to you (can’t upload the zipped files even after trying to change the suffix to .jpg). Thanks, Ty

I assume it’s no longer relevant to you since this is a post from three years ago, but in case someone stumbles on this thread after encountering this error, like I did -
what solved it to me, was downloading ALL relevant vocabularies from https://athena.ohdsi.org/.
Originally, I followed the instructions here:
https://github.com/OHDSI/ETL-CMS/tree/master/python_etl

which state:
“Download vocabulary files from http://www.ohdsi.org/web/athena/, ensuring that you select at minimum, the following vocabularies: SNOMED, ICD9CM, ICD9Proc, CPT4, HCPCS, LOINC, RxNorm, and NDC.”

My mistake was only downloading these vocabularies that they state as a MINIMUM.
This caused these errors, since not all required concepts were downloaded.

I suggest instead to download ALL vocabularies that are selected by default.

1 Like

Hi folks,

if somebody needs the good old 1k SynPUF data in Version 5.4 - here you go: https://caruscloud.uniklinikum-dresden.de/index.php/s/Qog8B5WCTHFHmjW/download

3 Likes
t