OHDSI Home | Forums | Wiki | Github

Creating tall table of longitudinal patient history - esp. for cohorts

We have prototyped creating a single tall table of complete longitudinal history of patients to support a range of analyses. This currently includes all EHR data mapped to OMOP, and will eventually include Claims and Patient Reported Outcomes. In our case, we also include all orders, alerts, clinical events, and data entry forms from our EHR - using non-standard codes when no suitable standard codes exist.

Has anyone else had success with this? I’d like to use this thread to brainstorm additional use cases and enhancement ideas.

We are doing this using standard SQL against OMOP data model - so should be easy to share.

Sample Use Cases

  • Easier assessment of patient clinical and visit history – I find it cumbersome to glean all of this from directly within an EHR or EDW. Gives ability to sort by datetime, or by data domain (e.g. history of lactate or creatinine levels; history of drug dosing)
  • Easier filtering of data by location (e.g. if I want to remove intra-operative drugs or vitals)
  • Easier review of clinical data before/after an encounter (e.g. prior diagnoses; timeliness of post-discharge follow-up)
  • Easier computation of clinically meaningful trends in result values (e.g. trends in BP, A1c, creatinine; rate of development of organ damage)
  • Easier refinement of concept set and cohort logic - confirming that have included all relevant source and standard codes in logic (e.g. can see longitudinal history with human-readable labels, so an find data that may be relevant for analyses).
  • Enables multiple types of descriptive statistics (e.g. filtering and characterization using standard visualization tools). For example, top drugs administered by location, provider, or specialty.
  • Should speed generation of temporal quality and safety triggers (e.g. delay in diagnosis or treatment) as new data elements (e.g. features or observations) that can be used in propensity matching or predictive models. Process for this to be determined - but temporal query languages support this well.
  • Will enable adding full clinical notes to the longitudinal history, if desired.

Current Design

  • All data from condition_occurrence, device_exposure, drug_exposure, measurement, observation, procedure_occurrence in single table. Maps all source tables to a set of 22 shared variable names (like concept_id, source_concept_id). Some are null depending upon source table, but includes most clinically meaningful variables.
  • Joined with visit_occurrence, visit_detail, person, provider, death so all of those metadata easily avaiable for each clinical entry
  • Denomalize for each relevant concept_id has domain, concept_name, and concept_code - this is used for source and standard concepts, plus demographics (race, ethnicity), provider specialty, visit location and visit_detail location
  • Conditionally enriched with PHI for easier operational use (e.g. unique person, provider, location IDs)
  • Table currently has 57 columns

Current Approach

  1. Reusable SQL to create tall table. Conditionally specify a list of cohort_ids. If used, filters that tall table to just patient clinical history within those cohorts from cohort_start_date to cohort_end_date. Also has date offsets in case want certain # of days before or after that cohort period (e.g. if the cohort entry time is a clinical event during a hospital stay, but want one month of data before and after that visit)
  2. Join with CodeSets table from the selected cohort, enriching the tall table with the relevant concept_id from a cohort, and only keeping rows that are references within those codesets (either source or standard concept_ids)

The current process takes less than 4 minutes to create 500M rows of data for a cohort of 25K patients (using Databricks), so we expect this should scale well.

Current Limitations
The second step uses concept_ids from concept sets (either global or in cohorts). Atlas generates temporary CodeSet tables to support this during cohort generation. Having a more persistent (and curatable) set of CodeSets would make it easier to do this at scale. We’d also want the ability to incorporate those CodeSet names in such a reference table (as opposed to the numeric codeset_id values within a cohort).

I welcome thoughts and discussion on this.

Isn’t that called “i2b2”, @Thomas_White? :slight_smile:

@Christian_Reich , I haven’t worked directly with i2b2, and we don’t want to have to implement multiple solutions. Given that we’ve optimized OMOP for our clinical operational needs, we’re looking for ways to extend it.

As a performance update, we decided to run this for our entire patient population first - e.g. create complete clinical records for all 4M patients and their up to 8 years of data. The query run in 35 minutes and generated a table with 9B rows. From there, it will be fast and easy to subset that table based upon cohort membership and start/end boundaries.