Using OMOP CDM as data extraction/data delivery format

mgkahn · July 22, 2019, 2:21pm

First the questions, then the background:

QUESTIONS: Is anybody using OMOP CDM as their data delivery format to give data extracts to investigators? If so, how do you do this, what data elements/fields to you provide, and how is it working for your investigators? What training do you need to give investigators when you hand over a data set in OMOP CDM format?

BACKGROUND: The University of Colorado uses OMOP for its integrated research data warehouse. We want to begin delivering data sets to our investigators in OMOP format for a couple of reasons: (1) it ensures direct provenance between what is delivered and what is in the warehouse, and (2) it introduces the OMOP CDM to our investigator community.

That said, I am considering violating what I just wrote by creating a version of the CDM that pre-joins all of the concept_ids and pulls in some of the key fields in the CONCEPT table such as concept_code and concept_name (and maybe other fields). Some of the reasons include (1) easier for recipients not comfortable with multi-table joins to work with the data or at least reduce the number of joins in a query and (2) eliminates the need to either ship or always require access to some version of the CONCEPT table. For sophisticated users, we could provide access to CONCEPT_ANCESTOR and/or CONCEPT_SYNONYMS.

At some point I hope to introduce ATLAS as an alternative/integrated data delivery + analytics environment but we are probably a long way off from being able to do this. Even so, I can’t envision ATLAS becoming the dominant method used by most of our investigators for their work (just the really smart ones……! ← that’s for Patrick…… ).

Any insights on using OMOP as a data extraction format would be appreciated.

Thanks,
Michael

roger.carlson · July 22, 2019, 3:13pm

I can see quite a lot of utility in putting the OMOP data into a data warehouse format.

I’ve created an application in MS Access which joins the Standard Data Tables to the Concept table (sometimes multiple times) so that both the concept_id and the concept_name or code are shown. Each data table is on a separate tab and all are grouped on a person. In this way, I can select a person, one of their visits, and see all of the conditions, procedures, measurements, etc at one go. I use this to compare my OMOP data back to the EHR.

While this works pretty well, all those joins tend to slow performance. Already having the tables pre-joined, and the fields pre-calculated would help a lot.

However, I think I’d do this only as a separate ETL, leaving the OMOP format pristine. In SQL, it’s always easier to put things together than to take them apart.

Mark_Danese · July 22, 2019, 8:08pm

It is an interesting question. We are working with the National Cancer Institute on a “light” version of SEER Medicare data. In discussions, it is pretty clear that a simple data structure is critical to them. When we do work like this, we focus on taking data from data models like OMOP and outputting the data in analysis ready (or almost ready) formats. We output a cohort table (one record per person) and an events table (multiple records per person), and attrition table, plus several lookup tables for vocabulary data as well as data dictionaries. We also generate a whole protocol for the extract with all of the documentation (i.e., the cohort and variable definitions). We do this so that someone has everything they need to write a manuscript using the data.

Having said the above, it is a ton of work to generate these datasets, which is why we built software to do it. But my message is that anything you can do to make the data easier, the better it will be. The people using the data will probably not want to take the time to learn a complicated structure, even if, in the long-term, it might benefit them to do so.

Others may have different experiences, of course. We have a very specific use case which may not align with yours.

Christian_Reich · July 23, 2019, 3:57am

@mgkahn:

I have the eery feeling we already discussed this before. But can’t find it.

As the de-facto holder of the Congregation of the Holy Office of the Inquisition I have to tell you that this is off Doctrine. But since concept_ids are unique and clean I cannot see a problem, other than you bloating the size of the database.

We deliver all IQVIA data assets in OMOP format ready to go in proper format. Nobody has mentioned anything. Of course, we also tell them to learn about the OMOP CDM, take a Tutorial or book one with us. So, maybe you don’t go too protective and let the kids figure it out.

What’s the problem? Do you need help? Or infrastructure? Or permission? We host ATLAS on the public infrastructure, and after a ton of pressure testing IT Security allowed it. Of course these are unidentified data of patients, and treating providers are not among our customers.

mgkahn · July 23, 2019, 9:47pm

@Christian_Reich: “I have the eery feeling we already discussed this before. But can’t find it.”

Not in public… (which further confirms Patrick’s admonishment against email conversations about OHDSI topics).