CMS VRDC Medicaid data ETLed into OMOP

This is a formal call for feedback on ETL of Medicaid data into OMOP common data model. We would like to hear feedback (including what is sub-optimal) from interested OHDSI-ers. Our ultimate goal is that the project is sustained in the long run within the community (after funding ends). You can post feedback here or file an issue/question within the github repo. (link below)

We hope that this ETL can be used also by folks who face a similar task of converting data into OMOP. We use SQL as primary ETL language. We also aim to follow OMOP best practices for ETL as well.

To use Medicaid data, a researcher must license the data from CMS and follow data use policies. See details in second post.

This ETL is being developed using funding from Health and Human Services (US goverment). (see second post)

The project will take 18 months for ETL work and additional time for Data Quality work (till month 36). The project started on August 1, 2021.
We (at NLM) are working on transforming data into OMOP and, in parallel, FDA team, (that also includes folks from Sentinel Operating Center and colleagues at Duke U) are working on converting same data into Sentinel model.

To see current version of the ETL, use this github repository:

With a call to unite all ETLs into OMOP under one framework, the posting of the ETL may change in the future. (move to that central framework) (similar to ohdsi-studies)

Here is the second part of the info. I wanted the first post to only have a single link to the actual ETL code.

Link to the project is this: Making Medicaid Data More Accessible Through Common Data Models and FHIR APIs | ASPE
(info in section Agency is incorrect; it should say FDA and NLM - as it correctly say in Background)

Link to CMS data warehouse is here: Data Dictionaries - Chronic Conditions Data Warehouse (this page is direct link to data dictionary of the native data (scroll down to Medicaid section). The larger website of CCWdata.org offers a lot of info for researchers and how to use it for research.

@Vojtech_Huser Apologies if this is mentioned somewhere but are you releasing a MAX ETL in addition to TAF?

Good question.

The project formally only targets the TAF files. (years 2014-now; TMSIS system).

Legacy format (MAX;also know as MSIS system,1999-2014 ) is not included.

for more info on formats, see our paper here Data Characterization of Medicaid: Legacy and New Data Formats in the CMS Virtual Research Data Center - PMC

table 1 of it