OHDSI Home | Forums | Wiki | Github

Test CDM v5 dataset

You are most welcome. Everyone should do an OMOP ETL at least once – there is no better way to learn the ins and outs of the vocabulary. Also, in the main README.md file we tried to acknowledge those who made earlier contributions to the project – I’m sure we are missing people – can someone check and let me know of anyone to add? I’m not sure who @claire-oi is.

I think you have everyone. Claire is Claire Cangialose at Outcomes Insights.

At the risk of having to ask for forgiveness, I merged our branch into the master branch this evening. It is ready to go.

I talked to Ryan. It is completely fine.

That’s awesome @christophe_lambert, thanks! @lee_evans can help with
storing a compressed version and post it on ohdsi.org so that folks can
download and play with it.

@Christophe_Lambert
@Christian_Reich has now setup his ftp server at ftp.ohdsi.org so you can work directly with Christian to upload the files.

For convenience, we have uploaded the pre-processed ETL files to the ohdsi FTP site in the synpuf folder. Further instructions on what the files are and how to load them into an OMOP CDMv5 database can be found at the beginning of the ETL-CMS/python_etl/README.md file.

For some reason the ftp links are not rendered in the README.md github markdown, so I’ve included them here below:

The data can be retrieved from this ftp folder. The file synpuf_1.zip (md5sum 0d11562053cec36999779cd5ae283c44) contains tables for the first 20th of the data (116,362 patients), and might be suitable for smaller-scale testing. The remaining 19 .csv.gz files represent the table data for all 20 parts (2,326,856 patients). Here are the direct links and md5sums for the files:

We hope this will serve as a useful resource for the community.

Christophe

A bug was found in the visit_occurrence table of the ETL by @sirpoovey and corrected, and I have uploaded new versions of visit_occurrence.csv.gz and synpuf_1.zip to the FTP site. If you have downloaded and loaded the data prior to this, you should only need to reload the visit_occurrence table.

Formerly the visit_concept_id for all visits was set to the concept for an inpatient visit (9201). Now visits from the inpatient source data have visit_concept_id set to 9201, visits from outpatient source data are set to 9202, and visits from carrier claims source data are set to 0, as we cannot distinguish between inpatient and outpatient visits for carrier claims data. We now retain versions of the ETL’d data within subdirectories at ftp://ftp.ohdsi.org/synpuf.

Someone was asking me today about more detail on the ETL-CMS code for the synpuf data, and I thought I’d provide a pointer to the OHDSI webcast I gave about it on July 5, 2016. Here is a link to the Webex recording: https://drive.google.com/file/d/0B3MHvw659x1kUUFyVVlLM0hRYzQ/view

To view this recording you would need to install the WebEx player for .ARF, files downloadable from here: https://www.webex.com/play-webex-recording.html

Not sure if this is the proper thread to reply to, but I’m having trouble running the CMS_SynPuf_ETL_CDM_v5.py script. I am trying to run it on the test data (DE_0). It does not generate any errors to the console, but it does not process any records, either. It appears to read through all of the OMOP concept files and create the dictionary; however, at the end of the script run, the following is outputted:
CMS_ETL done
Input Records------
File: beneficiary , records_read= 0
File: carrier , records_read= 0
File: inpatient , records_read= 0
File: outpatient , records_read= 0
File: prescription , records_read= 0
Output Records------
** done **

If I open up the directory where the output .csv files are supposed to reside, all of the files are present, but only the header line has been inserted. There are no records.

I am running the script (version 1.0.1 from github/master) using cygwin on Windows 7.

Have you followed step 4 of the instructions to set up the .env file to specify the paths appropriately?

If you have truly found a bug, or think the instructions need to be clarified, the issues tracker of the ETL-CMS github page would be a good place to post.

Christophe

t