OHDSI Home | Forums | Wiki | Github

1K sample of simulated CMS SynPUF data in CDMV5 format available for download

I’ve created a 1k sample of CMS SynPUF data in CDM Version 5 format for download at my company web site

It’s a first release and not all of the CDM version 5 tables are populated (in particular the COST tables are not populated). If you find any issues with this data set feel free to post them on this thread.

I thought it might be a useful small data set for folks who want to run OHDSI application demos with data that can be shared publicly (this is just simulated data) and to use as test data for development.

I’d like to thank the members of the CMS ETL working group for the great specification document they created. I used the specification as a guide to generate this data set.



This is great!

One thing, I think you left out the COPY DRUG_EXPOSURE line out of the load_files_into_postgresql.sql file.

All else is looking great!


I’ve now corrected that sql file in the zip file on the website. Thanks Jon.

@MauraBeaton, this is a pretty valuable resource, but hard to find for people new to OHDSI (who might benefit from it most).

Would it be an idea to reference Lee’s site from the OHDSI.org site? (Maybe under data standardization?)

I second this: one of the biggest hurdles for people is getting a properly ETL’d CDM datasource that people can play around with, and this is the thing that people would need to get started.

Maybe we can add links to our ‘getting started’ guide on our tools to provide people with a sample database if they want to check if their installation worked.

One more thought: @lee_evans, do you think you could run an Achilles build off that data and provide the exported JSON files as a zip? This way people could see the achilles results separaetly from doing a full CDM installation which might be helpful.

Yes +1 from me for this.

I was looking for a simulated dataset since I do not have a full CDM on my laptop and just want to play around with some of the R packages on the road.

Thanks @lee_evans

One small addition - the dataset has no observation_period table.

I wish there was a very clear convention on how to create observation_period table from “non-health plan” data.

Is there a better script (e.g., merging 30 day eras) that this ultra simple one?

--"create table observation_period as"
select person_id,min(visit_start_date) as observation_period_start_date,
  max(visit_end_date) as observation_period_end_date 
from visit_occurrence group by person_id

@ericaVoss, how do you tell this forum that the code block is SQL? (to get syntax highlighting?)

@Vojtech_Huser I just added the missing observation_period csv file to the synpuf1k zip file available here:

I also loaded the observation_period data into the OHDSI cloud sql server testing databases

@MauraBeaton I’m happy for you to upload the synpuf1k zip file directly to the OHDSI.org site if that makes it easier for folks to find. I would appreciate a link back to my company home page:
on the OHDSI.org web page referencing the synpuf1k zip file.

@Chris_Knoll I added a zip file containing the Achilles exported JSON files for the synpuf1k data here:

@lee_evans That would be great! I’ve already included a link to your download site on the CDM page, but if we could make that sample dataset more prominent on the website, that would be ideal.

@Vojtech_Huser - GitHub uses MarkDown to style text. Basically three backquotes/grave accent, then open bracket, “SQL”, closed bracket, type your SQL statement then close it with three grave accents.

But Google GitHub MarkDown for other examples.

1 Like


Yeah. It’s not a good place for this. Nobody will find it there. The page is about the CDM in general, not about a specific database. We should have a little page with links to commercial help. Let me think about it.

Hy guys!

I had trouble to import the 1k sample of CMS SynPUF data in CDM Version 5.3. So I had to alter some tables that the .csv files matches with the database. I created some SQL scripts to easily import the 1k sample data into the database.

@lee_evans: I can provide the files, so that people can easily import the sample data into the new CDM Version 5.3


Wow- this is so great @lee_evans! I was unable to download the files from your website- are they still available? Or, @a_cse, would you mind kindly sharing the files with me?

Thank you very much in advance.

@bglicksb I’m currently updating the 1k sample to use a newer v5.2 CMS SynPUF data set provided by @ericaVoss

It should be available in the next few days. I’ll post the link here when it’s ready.

@a_cse Would you be able to run your scripts on the updated 1k sample when it’s available? I can then upload the v5.3 version too.


1 Like

Thank you very much @lee_evans! I actually was able to set it up using the files from the OHDSI ftp site, but will keep an eye out for this as well. Thanks again!

@lee_evans Sure! Hit me up when the new sample data is available

@bglicksb @a_cse @Sigfried_Gold The synpuf 1000 person sample dataset in CDM V5.2.2 format is now available for download here: http://www.ltscomputingllc.com/downloads/

Thanks @Christophe_Lambert & @ericaVoss (and others in the OHDSI community) for performing the data conversion and providing a copy of the synpuf converted data.

There is a README.md file within the zip file with additional information.