OMOP VI: The Undiscovered Caboodle

OMOP VI: The Undiscovered Caboodle

OMOP. The final frontier. This is the mission of CHRI: Corewell Health. Its continuing mission: to explore this strange new CDM called OMOP, to seek out ever more Epic data, and build them into standard tables. To boldly post to the only forum legally available to us…Epic Userweb.

Stardate: -296646.58…

In this episode, OMOP VI: The Undiscovered Caboodle, we abandon our creaky old Clarity drive engine and update to the shiny new Caboodle drive.

Why We Chose Caboodle Over Clarity for Our OMOP Pipeline
After validating parallel pipelines across all 16 OMOP CDM v5.4 domains, Caboodle is our recommended production pipeline.

  • More data. 19% more clinical rows (61.3M vs 51.5M), with Observation up 430%, Visit Detail up 1,102%, and 3.6M Fact Relationships that Clarity can’t produce.
  • Visit detail linkage. ~99% across all seven hospital-based domains. Clarity has 0%. Critical for tying clinical events to ICU stays, transfers, and surgical episodes.
  • Better quality. 99.9% note class mapping (vs 97.3%), 98.2% device unique IDs (vs 0%), and elimination of duplicate anesthesia crossover rows.
  • Simpler architecture. Fact/Dim joins replace Clarity’s heavily normalized tables, insulating you from Epic schema changes. Builds run 1.8x faster (7m vs 13m).
  • Where Clarity still matters: near real-time freshness, validation baseline, and a few edge-case cross-references we’re actively bridging with hybrid models.
  • Bottom line: Caboodle isn’t just a port — it’s a materially better research dataset.

But it’s not just Caboodle, we’ve also significantly enriched our OMOP data.

New Data Flows (~2.6M+ rows added)

  • Domain-shift routing — Conditions, observations, and procedures that map to a different OMOP domain (e.g., a SNOMED condition that’s really a measurement) now get routed correctly instead of dropped
  • Social history — Smoking, smokeless tobacco, alcohol, and drug use history from SOCIAL_HX (~3.5M rows across the four flows
  • DNR/Code Status — 38K observation rows from ORDER_PROC
  • Device Exposure O2 — 25K+ oxygen device rows from flowsheets
  • APGAR scores and oncology staging added to Measurement

Bug Fixes (~148K+ rows recovered)

  • ANES_LDA wrong CSN column — was returning 0 rows; fix recovered 14K+ device rows
  • T_LOINC filter too restrictive — removed an overly narrow CONCEPT_CLASS_ID = ‘LAB TEST’ filter, recovering 45K+ measurement rows
  • Sentinel date fix — future dates on LDA devices corrected to NULL
  • Blood transfusion timing — ~84K rows now have more accurate start/end times from ORD_BLOOD_ADMIN
  • Blood device expansion — +1,579 device rows from adding 3 new PROC_ID mappings

So, we say goodbye to good old Clarity… But wait!

There’s still more! We made one last update to our Clarity pipeline. We’ve posted version 6 code for both Caboodle AND Clarity.

Click the link, and … engage!