It’s a first release and not all of the CDM version 5 tables are populated (in particular the COST tables are not populated). If you find any issues with this data set feel free to post them on this thread.
I thought it might be a useful small data set for folks who want to run OHDSI application demos with data that can be shared publicly (this is just simulated data) and to use as test data for development.
I’d like to thank the members of the CMS ETL working group for the great specification document they created. I used the specification as a guide to generate this data set.
I second this: one of the biggest hurdles for people is getting a properly ETL’d CDM datasource that people can play around with, and this is the thing that people would need to get started.
Maybe we can add links to our ‘getting started’ guide on our tools to provide people with a sample database if they want to check if their installation worked.
One more thought: @lee_evans, do you think you could run an Achilles build off that data and provide the exported JSON files as a zip? This way people could see the achilles results separaetly from doing a full CDM installation which might be helpful.
I was looking for a simulated dataset since I do not have a full CDM on my laptop and just want to play around with some of the R packages on the road.
One small addition - the dataset has no observation_period table.
I wish there was a very clear convention on how to create observation_period table from “non-health plan” data.
Is there a better script (e.g., merging 30 day eras) that this ultra simple one?
--"create table observation_period as"
select person_id,min(visit_start_date) as observation_period_start_date,
max(visit_end_date) as observation_period_end_date
from visit_occurrence group by person_id
;
@ericaVoss, how do you tell this forum that the code block is SQL? (to get syntax highlighting?)
@MauraBeaton I’m happy for you to upload the synpuf1k zip file directly to the OHDSI.org site if that makes it easier for folks to find. I would appreciate a link back to my company home page: http://www.ltscomputingllc.com/
on the OHDSI.org web page referencing the synpuf1k zip file.
Thanks.
@lee_evans That would be great! I’ve already included a link to your download site on the CDM page, but if we could make that sample dataset more prominent on the website, that would be ideal.
@Vojtech_Huser - GitHub uses MarkDown to style text. Basically three backquotes/grave accent, then open bracket, “SQL”, closed bracket, type your SQL statement then close it with three grave accents.
Yeah. It’s not a good place for this. Nobody will find it there. The page is about the CDM in general, not about a specific database. We should have a little page with links to commercial help. Let me think about it.
I had trouble to import the 1k sample of CMS SynPUF data in CDM Version 5.3. So I had to alter some tables that the .csv files matches with the database. I created some SQL scripts to easily import the 1k sample data into the database.
@lee_evans: I can provide the files, so that people can easily import the sample data into the new CDM Version 5.3
Wow- this is so great @lee_evans! I was unable to download the files from your website- are they still available? Or, @a_cse, would you mind kindly sharing the files with me?
Thank you very much @lee_evans! I actually was able to set it up using the files from the OHDSI ftp site, but will keep an eye out for this as well. Thanks again!
Thanks @Christophe_Lambert & @ericaVoss (and others in the OHDSI community) for performing the data conversion and providing a copy of the synpuf converted data.
There is a README.md file within the zip file with additional information.