OHDSI Home | Forums | Wiki | Github

THEMIS: New Initiative to create definitive conventions for data in the OMOP CDM


(If you don’t want to read the entire text: This is about a new initiative to create conventions to build trust in data converted to OMOP CDM and enabling quality certifications).

The idea of the OMOP CDM is to create a representation of the source data in such a way that queries and tools can be run more or less blindly (without access to the data) and still return the correct result. This works if the format (table structure) and the vocabularies (coding schemas) are all standardized. Achieving this allows developing standardized tools and methods and drive quality, reproducibility and efficiency, and thus gets us closer to fulfilling the OHDSI objectives.

And it does really work. Except not quite 100%. There are still a good number of issues, which despite the CDM and vocabularies the exact choice of representing a clinical event or circumstance can be ambiguous, for example:

  • Patients with multiple values for sex, gender, race
  • Patients with a birth year after the database ends
  • Providers with multiple specialties and Providers with multiple care sites
  • Potentially contradictory relationship between days supply, quantity, drug exposure end date and sig in Drug Exposure
  • Duplicate procedures or visits at the same day
  • Medical events after death date.
  • Multiple death dates per person, multiple causes of death
  • Medical events before database begins or after database ends
  • Outpatient/ER/inpatient transition
  • Observation Period definitions for Claims and EHR records
  • Formulas for calculating total_paid
  • Negative values in tables, uninterpretable values in tables (lovely entries like NA, Unknown, Phone Call Failed Attempt, No Consultation, Did Not Attend)
  • Invalid numerical values
  • Representation of lifestyle observations like smoking

That’s just a quick list, they are many more. These problems are not just annoying, everybody who does an ETL job will inevitably end up making different decisions, which will create problems with reproducibility of results.

We want to form a Working Group to tackle this problem, and create all necessary business rules and conventions. They can be then used to create a system of quality checks, which in turn could be used for a OMOP data certification. We are thinking of calling all this THEMIS, which is the Goddess of divine order, fairness, law, natural law, and custom. Not sure what it will take, but I like the goal of “divine order”, that’s something to shoot for, and we don’t have it now.

Now it’s summer, and this is to create awareness in the community and invite folks. But we expect the constitutional session in September, with online meetings and occasional face-to-face meetings.

Any interest in helping us in this endeavor? Any other thoughts?

Best regards,
Minnie Chou, @Asha_Mahesh, @gregk and Christian


@Christian_Reich This is great! I’d like to join.
This is a very important step to bring everyone on the same page who the data is being transformed and how the conventions really work. As you mentioned, we need to have source-specific definitions depending on the type of data source (claims, EHR, etc). I did a study on accommodating HL7 CCD data to OMOP in my dissertation, and found some issues that were not covered in the CDM documentation. I can share them with the group.

Hi @Christian_Reich, I’d like to join too!

I would like to join the working group.

I’ll be there.

I would like to join.

Count me in.

Thanks everybody. That’s wonderful. We have already a good number of folks (some of them not visible on this thread). And we are not going to close some kind of “membership”, in the OHDSI spirit this is open.

We haven’t finalized any details yet, but just be aware that we are planning workshops to work through the issues. If you are interested, I just want to make sure nobody forgets that it means work and travel.

I am very much interested in this, and am looking forward to helping out.

This is very important work. I/we will be there.

I would like to join too.


Need your email address.

Hi @Christian_Reich, I’m interested in participating as well.

I’d like to join as well. m1mv2.5@gmail.com

I would like to see the HL7 CCD correlation/non-correlation points you mention, Hamed. At California Department of Health Care Services, we have a data model built somewhat on HL7 and are looking at OMOP for clinical data system redesigns.


Sure, I will present the work in OHDSI symposium in October. I have also a paper to be published in couple of months.

Hi @Christian_Reich, I’m interested in participating. I am very interested in ETL and data quality. (and have done some prior and ongoing work)

Yes, please include me as well. Thank you!

This is a great idea! Please include me as well! Thanks so much!

Hi Christian, can you please include me as well? Thank you.