This is a great debate. A continuation of an issue debated during ‘Pehnotype Phebruary’:
I would articulate the issue as:
Are there limits to what kind of clinical facts can be represented by standard OMOP, phenotypes and standard analytics?
It is no coincidence that in Phebruary this issue was most hotly debated in phenotypes involving cancer. Cancer research demands the representation of clinical facts like ‘histology’, ‘staging’, ‘disease progression’, and ‘treatment lines’. The OHDSI Oncology Working Group was formed out of difficulties representing these important cancer clinical facts in OMOP.
Thus was introduced:
Structures: EPISODE, EPISODE_EVENT, MEASUREMENT.measurement_event_id, MEASUREMENT.measurement_event_id, MEASUREMENT.meas_event_field_concept_id.
Vocabularies: ICDO3, NAACCR, Cancer Modifier, Hemonc.org, CAP electronic Cancer Checklists (College of American Pathologists)
Standardized ETLs: NAACCR ETL
Treatment Regimen Detection Algorithms: OncoRegimenFinder, Tracer (AJOU University)
It looks like the OHDSI community has formed various ‘factions’ around how to answer the above-articulated issue.
-
Faction 1 (the ‘Methodists’):
- All clinical facts can be represented by current standard OMOP, phenotypes, and standard analytics.
- We don’t need the EPISODE table.
- ETL’ers should not interpret source data and derive new data.
- Rote ETLs and statistics are enough.
- Members: Patrick Ryan, Gowtham Rao
-
Faction 2 (the ‘Derivers’):
- Not all clinical facts can be represented by standard OMOP, phenotypes, and standard analytics.
- We need the EPISODE table to represent cancer disease phases and treatment lines.
- Episodes belong in the standardized derived elements.
- We need to come up with modifiers (e.g. 734306 = ‘Initial diagnosis’) and conventions on how to populate cancer events in the traditional OMOP standardized clinical event tables.
- These modifiers and conventions will enable the development of a post-ETL algorithm to derive cancer disease and treatment episodes from the standardized clinical event tables.
- Episode population should not be opaque and depend on data source context.
- ETL’ers using modifiers, adhering to conventions and a promise to develop a post-ETL algorithm are enough.
- Members: Rimma Belenkaya, Robert Miller.
-
Faction 3 (the ‘Contextualists’):
- Not all clinical facts can be represented by standard OMOP, phenotypes, and standard analytics.
- We need the EPISODE table to represent cancer disease phases and treatment lines.
- Episodes belong in the standardized clinical data tables.
- We need to come up with simple semantic targets based on oncology research standards to support the population of disease phases and treatment lines in the EPISODE table.
- Episode population can only be done in the context of source data by an advanced informatics infrastructure that supports an ETL’er.
- Such advanced informatics infrastructure will include generating abstractions from unstructured imaging and pathology lab reports and reconciling EHR/claims data with multiple sources: e.g., tumor registry, oncology analytic platforms, and oncology EMRs.
- Clear semantic targets and institutions with advanced informatics infrastructure are enough.
- Members: Christain Reich, Asieh Golozar, Michael Gurley
I know these ‘factions’ are overly simplistic. Please take my personal assignments to the factions as an attempt to sharpen the contours of the debate. Nothing more.
One last thing that I will say is that the oncology world is furiously already engaged in the process of interpreting source data and deriving new data. There is a cottage industry of commercial and open-source solutions helping institutions to generate cancer disease phases and treatment lines. Every academic medical cancer center in the country is engaged in such efforts. So fighting against that tide is really a decision for the OHDSI community to be open or closed to incorporating these emerging oncology data assets. One thing the cottage industry has not provided is an open-source clear structure and semantics to represent cancer disease phases and treatment lines. mCode has made significant contributions in this area but is more focused on data collection and transport. Not open data analysis. I think OHDSI has a great opportunity to be that open-source structure and semantics for oncology. But we need to make the target simple enough that folks can achieve populating disease and treatment episodes within our lifetimes.