OHDSI Home | Forums | Wiki | Github

1K sample of simulated CMS SynPUF data in CDMV5 format available for download

Ok, I’m sorry, I misread the error: it’s explaining that the value ‘8507’ in the gender_concept_id column of Person is not found in the table ‘concept’. That’s actually a really good message :slight_smile: even tho I misread it.

It means that your vocabuary tables aren’t populated, you need to load them (via download from Athena) before applying those key constraints. Then you’ll have all the necessary concepts in the ‘concept’ table, and the foreign keys should work.

Where there instructions in the synpuf guide that instructed how you load the vocabulary tables?

@Chris_Knoll I couldn’t find direct instructions for loading the vocabulary tables. But I found a script to load vocabulary tables in this link: GitHub - OHDSI/CommonDataModel at v5.2.2 → PostgreSQL → VocabImport.

It looks like we need the following csv files:
DRUG_STRENGTH.csv
CONCEPT.csv
CONCEPT_RELATIONSHIP.csv
CONCEPT_ANCESTOR.csv
CONCEPT_SYNONYM.csv
VOCABULARY.csv
RELATIONSHIP.csv
CONCEPT_CLASS.csv
DOMAIN.csv

I looked at Athena, and was overwhelmed by the large amount of data on it :sweat_smile:. Could you please tell me which vocabulary data should I download? I only need some sample data to get ATLAS up and running.

Thank you very much for your help! I really appreciate it.

I need to tag @lee_evans, @ericaVoss or @gregk as they have more experience than I do about setting up the vocabulary.

Sorry to pass the buck!

@lee_evans shouldn’t the OMOP Vocabulary needed be included in the export of Synpuf?

1 Like

I think you’re right @ericaVoss, I don’t know why i didn’t think of that: if you build a CDM, you should have the vocabulary attached to it. If you load some other version of the vocabulary into the CDM, you could get some strange results. So the SynPUF should probably have those vocabs pre-loaded.

Yup! The only difference in what is in the Vocab is we only share the free vocabularies, or the ones you don’t need a license for. But that is plenty fine for Synpuf.

Hi @lee_evans,
I recently decided to switch over to CDM version 5.3.1. Unfortunately, i have not managed to find a SynPUF sample for this version. Is there a data set available or should I ask @a_cse for the mentioned script?
Thanks in advance
Mirko

@Mirko I’m not aware of a publicly available later version

The Synthea ETL package provides a good sample in V5.3.1 and V6:

For V5.3.1: https://github.com/OHDSI/ETL-Synthea/tree/v5.3.1
For V6: https://github.com/OHDSI/ETL-Synthea/tree/master

@Ajit_Londhe Thanks a lot. I will have a look at it.

Hi there,

here is a SynPUF data sample in version 5.3.1 without guarantee of correctness…
https://caruscloud.uniklinikum-dresden.de/index.php/s/teddxwwa2JipbXH/download

4 Likes

Thanks @Mirko! It’s really appreciated that you provided these files :slight_smile: I can confirm that they load successfully in a postgres database and the indexes & constraints for OMOP v5.3.1 run without error. I will comment back here if I find anything wrong but equally if anyone wants the SQL code (slightly modified version of CommonDataModel/PostgreSQL at v5.3.1 · OHDSI/CommonDataModel · GitHub) to import @Mirko’s Synpuf files and the standard vocabulary csvs into postgres, feel free to @ me here.

Hello, @tystan, following up on your post from April 8th… I would be very interested in your SQL Server scripts to load the SynPUF files and vocabularies into the CDM 5.3.1. Please, could you share those?

Also, many thanks to @Mirko and @tystan for pursuing the SynPUF 1K files for CDM 5.3.1. Looking forward to using this.

@tystan , I am also interested in the scripts to load the files and vocabularies.

Thanks,
Yvonne

Hi @yradsmikham, I have messaged you my email so I can send the files to you (can’t upload the zipped files even after trying to change the suffix to .jpg). Thanks, Ty

I assume it’s no longer relevant to you since this is a post from three years ago, but in case someone stumbles on this thread after encountering this error, like I did -
what solved it to me, was downloading ALL relevant vocabularies from https://athena.ohdsi.org/.
Originally, I followed the instructions here:
https://github.com/OHDSI/ETL-CMS/tree/master/python_etl

which state:
“Download vocabulary files from http://www.ohdsi.org/web/athena/, ensuring that you select at minimum, the following vocabularies: SNOMED, ICD9CM, ICD9Proc, CPT4, HCPCS, LOINC, RxNorm, and NDC.”

My mistake was only downloading these vocabularies that they state as a MINIMUM.
This caused these errors, since not all required concepts were downloaded.

I suggest instead to download ALL vocabularies that are selected by default.

1 Like

Hi folks,

if somebody needs the good old 1k SynPUF data in Version 5.4 - here you go: https://caruscloud.uniklinikum-dresden.de/index.php/s/Qog8B5WCTHFHmjW/download

3 Likes

I just happened to be doing the same update!

Thanks @Mirko

Lee
Is the synpuf 1000 person sample dataset in CDM V5.2.2 format the SAME as the one used by OHDSI Atlas to create cohorts?
Thanks
D

@daveolaleye The OHDSI demo Atlas instance uses a larger sample of the same Synpuf simulated data set.

Note. The Synpuf 1000 person CDM V5.2.2. format file is now quite old and should no longer be used.

Instead, I recommend using the CDM V5.3.1 or CDM V5.4 versions of the Synpuf 1000 person sample data files that @Mirko has kindly created. Scroll up to see the download links earlier in this forum post.

t