(If you don't want to read the entire text: This is about a new initiative to create conventions to build trust in data converted to OMOP CDM and enabling quality certifications).
The idea of the OMOP CDM is to create a representation of the source data in such a way that queries and tools can be run more or less blindly (without access to the data) and still return the correct result. This works if the format (table structure) and the vocabularies (coding schemas) are all standardized. Achieving this allows developing standardized tools and methods and drive quality, reproducibility and efficiency, and thus gets us closer to fulfilling the OHDSI objectives.
And it does really work. Except not quite 100%. There are still a good number of issues, which despite the CDM and vocabularies the exact choice of representing a clinical event or circumstance can be ambiguous, for example:
- Patients with multiple values for sex, gender, race
- Patients with a birth year after the database ends
- Providers with multiple specialties and Providers with multiple care sites
- Potentially contradictory relationship between days supply, quantity, drug exposure end date and sig in Drug Exposure
- Duplicate procedures or visits at the same day
- Medical events after death date.
- Multiple death dates per person, multiple causes of death
- Medical events before database begins or after database ends
- Outpatient/ER/inpatient transition
- Observation Period definitions for Claims and EHR records
- Formulas for calculating total_paid
- Negative values in tables, uninterpretable values in tables (lovely entries like NA, Unknown, Phone Call Failed Attempt, No Consultation, Did Not Attend)
- Invalid numerical values
- Representation of lifestyle observations like smoking
That’s just a quick list, they are many more. These problems are not just annoying, everybody who does an ETL job will inevitably end up making different decisions, which will create problems with reproducibility of results.
We want to form a Working Group to tackle this problem, and create all necessary business rules and conventions. They can be then used to create a system of quality checks, which in turn could be used for a OMOP data certification. We are thinking of calling all this THEMIS, which is the Goddess of divine order, fairness, law, natural law, and custom. Not sure what it will take, but I like the goal of "divine order", that's something to shoot for, and we don't have it now.
Now it’s summer, and this is to create awareness in the community and invite folks. But we expect the constitutional session in September, with online meetings and occasional face-to-face meetings.
Any interest in helping us in this endeavor? Any other thoughts?
Minnie Chou, @Asha_Mahesh, @gregk and Christian