OHDSI Home | Forums | Wiki | Github

Medicare ETL development

Hi Don, I can help out. I’ll touch base with Ryan later today also.

Thanks,
Cy

The AMIA abstract for the SynPUF ETL was accepted! Congratulations to all, and thanks for everyone’s help. The only comment was below, so I don’t think we need to do any editing.

“In this project the authors developed a tool for data analysis for de-identified Medicare data and reported their findings. Tools like this are always helpful to make use of the vast Medicare data.”

Funny. Must have taken the guy a lot of introspection to come to taht conclusion.

But good.

Congrats @Mark_Danese et al, and thanks Mark for your continued leadership
on this. We should take this acceptance as added encouragement of the need
and value for getting the SynPUF data available in OMOP CDM format for all
the community to benefit from. Great stuff.

This might be a little off topic, but I am about to map the Medicaid MAX RIF files to the OMOP format. In order to provide the mapping as a resource to the community, is it best to use the rabbit in a hat ETL document as the artifact?
My initial approach is loading the files into a set of database tables in the basic format of the files. The ETL to OMOP would take place in SQL. I should be able to include the SQL code also.

Richard,

Loading files directly to a database is our typical first step, too. We want to get the data into a real data store where we can write ETL SQL to transform the syntactic data into OMOP v5 as quickly as possible.

  • CSVKit provides the csvsql command which will evaluate the CSV (or tab delimited) data and generate an SQL table structure to hold the data.
  • We skip using csvsql to load the table due to latency. Stick with your database’s bulk load features to populate the tables.

Bill

@Richard_Starr:

MAX RIF files? Like this one: http://www.resdac.org/cms-data/files/max-ps?

@wstephens Thanks. The main issue with these files, other than the size, is the lack of a header record. I tried white rabbit also, but I end up having to paste the column names into the spread sheet and use excel to convert into the create table statement. Their documentation is pretty good, but they define the structure by SAS. Once again, not a big issue, but I still need to manually make modifications to the create table. But I am just complaining.

@Christian_Reich Yes, those are the ones. Since they are PHI and highly controlled by CMS, the actual data will not be available. But I can make the mapping and logic available. They are just basic claims and Rx files.

I’m having trouble running this code. Is this the right place to ask for help?

python CMS_SynPuf_ETL_CDM_v5.py

Traceback (most recent call last):
File “CMS_SynPuf_ETL_CDM_v5.py”, line 83, in
BASE_ETL_CONTROL_DIRECTORY = os.environ[‘BASE_ETL_CONTROL_DIRECTORY’]
File “/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/UserDict.py”, line 23, in getitem
raise KeyError(key)
KeyError: ‘BASE_ETL_CONTROL_DIRECTORY’

This is the right place! Can you send the output of ‘pip list’ ? Let’s check your local configuration first.
(If you’re running in a virtualenv, activate that first).

Thanks,
Don

Malcolms-MacBook-Pro:python_etl malcolmmcroberts$ pip list
altgraph (0.10.2)
awscli (1.7.36)
bdist-mpkg (0.5.0)
bonjour-py (0.3)
botocore (1.0.1)
click (5.1)
colorama (0.3.3)
docutils (0.12)
jmespath (0.7.1)
macholib (1.5.1)
matplotlib (1.3.1)
modulegraph (0.10.4)
numpy (1.8.0rc1)
pip (7.1.2)
py2app (0.7.3)
pyasn1 (0.1.8)
pyobjc-core (2.5.1)
pyobjc-framework-Accounts (2.5.1)
pyobjc-framework-AddressBook (2.5.1)
pyobjc-framework-AppleScriptKit (2.5.1)
pyobjc-framework-AppleScriptObjC (2.5.1)
pyobjc-framework-Automator (2.5.1)
pyobjc-framework-CFNetwork (2.5.1)
pyobjc-framework-Cocoa (2.5.1)
pyobjc-framework-Collaboration (2.5.1)
pyobjc-framework-CoreData (2.5.1)
pyobjc-framework-CoreLocation (2.5.1)
pyobjc-framework-CoreText (2.5.1)
pyobjc-framework-DictionaryServices (2.5.1)
pyobjc-framework-EventKit (2.5.1)
pyobjc-framework-ExceptionHandling (2.5.1)
pyobjc-framework-FSEvents (2.5.1)
pyobjc-framework-InputMethodKit (2.5.1)
pyobjc-framework-InstallerPlugins (2.5.1)
pyobjc-framework-InstantMessage (2.5.1)
pyobjc-framework-LatentSemanticMapping (2.5.1)
pyobjc-framework-LaunchServices (2.5.1)
pyobjc-framework-Message (2.5.1)
pyobjc-framework-OpenDirectory (2.5.1)
pyobjc-framework-PreferencePanes (2.5.1)
pyobjc-framework-PubSub (2.5.1)
pyobjc-framework-QTKit (2.5.1)
pyobjc-framework-Quartz (2.5.1)
pyobjc-framework-ScreenSaver (2.5.1)
pyobjc-framework-ScriptingBridge (2.5.1)
pyobjc-framework-SearchKit (2.5.1)
pyobjc-framework-ServiceManagement (2.5.1)
pyobjc-framework-Social (2.5.1)
pyobjc-framework-SyncServices (2.5.1)
pyobjc-framework-SystemConfiguration (2.5.1)
pyobjc-framework-WebKit (2.5.1)
pyOpenSSL (0.13.1)
pyparsing (2.0.1)
python-dateutil (2.4.2)
python-dotenv (0.1.2)
pytz (2013.7)
request (0.0.2)
rsa (3.1.4)
scipy (0.13.0b1)
setuptools (1.1.6)
six (1.9.0)
Twisted (13.2.0)
vboxapi (1.0)
wheel (0.24.0)
xattr (0.6.4)
zope.interface (4.1.1)
The directory ‘/Users/malcolmmcroberts/Library/Caches/pip/http’ or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo’s -H flag.
Malcolms-MacBook-Pro:python_etl malcolmmcroberts$

Hi Malcolm - this week got away from me. Are you still having issues? I should have time this weekend to look at it again.

t