Please use this spot to document:
- Describe the issue/topic?
- What do we know about this topic? What has been discussed?
- What are recommendations for how to handle this issue/topic?
- What next steps should be taken?
Related posts:
OHDSI Home | Forums | Wiki | Github |
Please use this spot to document:
Related posts:
this what we come up with so far:
Describe the issue/topic?
The EPISODE table aggregates lower-level clinical events (VISIT_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, DEVICE_EXPOSURE) into a higher-level abstraction representing clinically and analytically relevant disease phases, outcomes and treatments. The EPISODE_EVENT table connects qualifying clinical events (VISIT_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, DEVICE_EXPOSURE) to the appropriate EPISODE entry. For example, cancers including their development over time, their treatment, and final resolution.
The EPISODE_EVENT table connects qualifying clinical events (such as CONDITION_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, MEASUREMENT) to the appropriate EPISODE entry. For example, linking the precise location of the metastasis (cancer modifier in MEASUREMENT) to the disease episode.
These tables are available in OMOP CDM ver. 5.4 only and are not implemented into the ATLAS yet. They can be a good alternative to the FACT_RELATIONSHIP table (?)
What do we know about this topic? What has been discussed?
What are recommendations for how to handle this issue/topic?
What next steps should be taken?
To answer a question: do we want to put drug combinations (when several drugs are used to treat a given condition, but not making a regimen), for example, combination of antibacterial and anti-inflammatory to treat an infection, into the EPISODE table?
To improve and publish convention
To expand Episode usage to not-cancer disorders both for disease and treatment episodes
To get more use cases
Most of the issues you raised will be described in the upcoming comprehensive documentation about EPISODE population for oncology. To highlight some conventions that address your questions:
Stay tuned for more conventions + greater details in the documentation.
How is it possible with drug Regimen? only with regimen finder?
What if I have the source letâs say âlines of therapyâ table where all regimen are stored? this table will give me more comprehensive results than the Regimen finder, because it has custom regimen, the RegimenFinder canât find.
How do I connect events to a disease episode? each procedure or observation will have the modifier as well?
Blockquote
Most of the issues you raised will be described in the upcoming comprehensive documentation about EPISODE population for oncology. To highlight some conventions that address your questions:
Are these strictures for Episode in general, or only for Episode with respect to oncology?
Head on, @roger.carlson.
We need to distinguish between episodes used in oncology, and episodes used elsewhere. @Dymshytsâs nice summary would be split along those. Letâs pressure test them all. By pressure test (you guessed it) I mean can we think of an analytical use case and elaborate on it:
General thoughts
Oncology episodes
Non-oncology episodes
I think thatâs it. Again, if we donât have the use cases we should not do it.
Some thoughts:
We definitely need episodes in oncology. Cancer patients generate a huge volume of healthcare activity all related to a single primary tumor and we need that organizing concept to differentiate between different tumors. And many cancer treatment guidelines, drug labels, clinical trial eligibility criteria, and outcomes measures depend on how may lines of therapy the patient has received. Thatâs much easier to handle with treatment episodes than with dozens or hundreds of (incomplete) medication administration records. We also have special semantics around treatment holidays (patient paused treatment due to toxicity, due to surgery, due to other illness, due to travel, etc.) and the cancer treatment episode heuristics can accommodate this. We can also annotate treatment episodes with whether they were completed - when analyzing comparative effectiveness (and not compliance) we only want to include patients who actually completed the treatment. Again, very difficult with individual medication admin records. I agree itâs not ideal that the heuristics for cancer will be different from other condition and treatment episodes. But cancer condition and treatment episodes seem so useful to me that I wouldnât discard them due to a âclosed worldâ desire.
I agree it will be hard to draw the line on which conditions should be represented with episodes. Clearly sore throat or headache shouldnât be episodes, but persistent or recurrent infection, chronic pain, and type 2 diabetes are conditions that may resolve but are long enough for an episode concept to be use cases, especially in combination with treatment episodes to inform phenotypes like âstaph infection unresponsive after three lines of therapyâ. Of course, even in cancer determining the end of an episode is still aspirational (we donât have structured outcomes) and is likely even more challenging in other conditions. The EHR seldom tells you when a condition is cured. But the existence of a potentially identifiable end to an episode may be a determinant of which conditions are candidates for episodes. An example in oncology is an imaging report indicating no evidence of disease.
The condition (and maybe treatment) episode extraction feels connected to phenotypes. The âcommunitiesâ defining high-quality phenotypes are also the people who can state the heuristics for condition and treatment episode extraction/construction for their phenotypes, and they could produce RegimenFinder for their treatments(s). They are also the people who can determine whether episodes are useful in their conditions at all. Oncology is just the first community to encounter the need. Communities also relate to comprehensiveness; a good example is solid tumors (described above) versus heme malignancies in which researchers might disagree with the solid tumor concept of a condition episode. But as heme builds an OMOP community they can develop phenotypes and (optionally) episode heuristics in parallel.
I donât see it explicitly in your questions, but I realize this leads to a CDM containing episodes with many different semantics/heuristics, and what does that mean for âuniversalâ analysis tools?
Itâs late and I think Iâm being too verboseâŚ
The reason for making it a two-step is to preserve available regimen or other episode data at the level of regular OMOP tables (low level events). Presently, OHDSI tools are not using EPISODE. Therefore, regimen or other episode related information cannot be surfaced by OHDSI tools unless stored in regular (vanilla, low level event) tables.
These are conventions we established for now for storing episode related data in regular OMOP tables:
@jmethot , thank you for clearly outlining the need for episodes in oncology analytics. They are key for answering most common cancer research questions. This has been established.
As for episode derivation, it is a rapidly evolving area. On one hand, there are many research registries in oncology that manually abstract this information. One of the most prominent and reliable is US Tumor Registry. We will be definitely leveraging these sources. On the other hand, there are evolving deterministic and probabilistic algorithms that derive episodes from low-level events. None of these methods are perfect or conform to the same rules and conventions. However, they have been used outside of OHDSI and will continue to develop.
Therefore, we introduced the foundational structure for persisting episodes and initial set of conventions for episode population/derivation. This platform enables testing different methods of episode derivation and using them in analysis.
Pregnancy is an episode. And different than the chronic diseases.
Tagging @acallahan @louisahsmith
@MPhilofsky , when you say âpregnancy is an episodeâ, can you help me understand how you would differentiate a âpregnancy episodeâ from a âpregnancy cohortâ entry? I ask because we are actively working on phenotypes and cohort definitions to represent the span of time that a person belongs to a health state, and pregnancy is one of the specific use cases we have been using for cohorts, where weâd like for a cohort start date to be at conception and a cohort end to be when the pregnancy outcome is observed. There can be cohorts for specific pregnancy outcomes (e.g. livebirth, stillbirth, abortion) and also a composite cohort that combines all pregnancy outcomes to represent the collective spans of time that a women is in the âpregnant stateâ.
This is an excellent question, @Patrick_Ryan. What is the difference between cohorts and episodes.
I would claim, letâs see if this flies: Both are memberships of patients in something defined by criteria and have a start and end. The difference is
Cohort:
Episode:
Does that make sense?
@Christian_Reich - lets say we have a standard repository of peer reviewed algorithms (version controlled and enumerated) that can run on the core CDM tables (visit_occurrence, condition_occurrence, drug_exposure etc) and create an output that is an episode as defined as continuous span of time the person had the episode
and we export these algorithms with every installation of OHDSI software (i.e. for example, atlas would ship with it built in)
would that address these concerns
- Not an ETL job since highly variable and dependent on the study
now, it is not study specific. it is not highly variable infact it is standardized.
- Finite predefined list
This repository of algorithms (lets call it OHDSI Phenotype library) would be a finite list. It would be credible and trusted.
Would be consistent across OHDSI network
Would be built from OMOP converted data
Looks like you have a stake in this, donât you? Well, letâs dissect.
Stop right here. Just like ETL, I claim you cannot build Episodes on the OMOP CDM standard tables alone. You need the source. Reason: Depending on what the source captured, the definitions would differ. For example, take the episode âProgressionâ. You can get that from an abstracted record in a tumor registry, or from the path lab, imaging report or clinical record in an EHR (which may have to be NLPed out or is in some kind of structured place).
Also, it cannot be peer reviewed. Peers cannot see the source. Episodes are built against a set of requirements.
Very nice idea, but that doesnât make it part of the CDM. Thatâs a convenience thing.
Well, come on. The cohorts are made for the studies. In your head, as you come up with the content of the library, you abstract from a multitude of typical studies and standardize to that. But there are potentially an infinite number of cohorts you create and standardize. Or is the current list everything you could ever need?
Also, your standard cohorts are always Conditions. You donât make cohorts for other domains. Why? Because we cannot rely on the diagnostic codes for reliable condition cohorts. They are overreported and underreported, their timing stinks, and their definition may not match what you need. So, you do all the gymnastics to get around those shortcomings (without making transparent what gymnastic move is addressing what issue, as I have previously complained).
Episodes have a different purpose: They are abstracted conditions as well, but they also describe the dynamic nature of the disease, and they organize complex treatments.
Finally, the episodes keep the connection to the events they are built from, or they are related to (EPISODE_EVENT table). Cohorts do not.
@Christian_Reich and @Gowtham_Rao ,
As I described in this thread, we introduced a list of condition modifiers (e.g. 734306 = âInitial diagnosisâ) that supports preserving available source information related to episode definition. We have already added these modifiers to the ETL from Tumor Registry. We are also recommending this for ETL of any data that has insights about episodes. This data along with concept_type_id will be used for post-ETL derivation of episodes. Therefore, you can and should build Episodes on the OMOP CDM standard tables. Moreover, before we develop any tools that use Episodes, these modifiers can be also surfaced by available tools.
@Christian_Reich and @Gowtham_Rao, we are developing algorithms that are based solely on regular OMOP tables (e.g. Regimen Finder is based on DRUG_EXPOSURE). Therefore, they can be peer-reviewed. I have proposed an idea very similar to @Gowtham_Rao for a repository of algorithms where each algorithm will have a concept assigned. This will allow for preserving provenance of episodes in Episode.Episode_Type_Concept_ID.
Looks like @rimma disagreed with you. Also sounds like a claim that is not substantiated. So lets strike this off.
What are we talking about? changing data representation (i.e. converting to OMOP form) or deriving new events (i.e. using some intelligence to generate new data from other data using an algorithm) I am only interested in the later i.e. algorithm. I do not want to empower the ETLâr to do this - as then you are giving ETLâer too much power to interpret source data and derive new data and this is NOT good for reproducible research (e.g. ETL rules may be unknown, have errors). By using cohort approach - we are making them all available.
Lets strike this off - same argument as above i.e. derivation is not from source data but from core OMOP CDM tables.
Incorrect @Christian_Reich - i think few years ago that was the correct, but right now - we are thinking (we = OHDSI Phenotype Development and Evaluation workgroup) are thinking of a âtargetâ and cohort definitions are algorithms trying to identify cohort that match that target. Any deviation from target is the error. An individual study is not a component of Phenotype Development and Evaluation. Happy to discuss that - come join the workgroup
I dont think thats correct either @Christian_Reich
Finally, the episodes keep the connection to the events they are built from, or they are related to (EPISODE_EVENT table). Cohorts do not.
All valid points, and use of Episode tables is valid. My position is not whether we need to populate/use Episode table. I am arguing against the ETLâr making undocumented/unreproducibile algorithmic choices to populate a table with derived content. If you want to just do a ETL of source to target - sure go for it, but the T should be minimal and have record level referential integrity to the source where possible.
If instead you have an algorithmically derived summary of multiple records in source, especially if it is running on source tables - then I think thats not good!
Ok - Re -reading all posts from top - these are derived tables and NOT clinical data tables.
and it has fields that are not in cohort table like following
These tables are to be calculated from omop core tables/clinical data tables and NOT source tables.
So - now some of the arguments makes senseâŚ
but we will have the age old problem of not being able to fully trust these derived tables. e.g. in OHDSI network studies, we rarely seem to use condition_era, drug_era â but use the condition_occurrence and drug_exposure. this is the reason for THEMIS
Nice debate here. But it is getting long. Let me see:
Correct. If the source tells us what the episodes are we are all set. But the debate is about derivation when you donât have that, Ă la @Gowtham_Raoâs phenotypes. He claims all you need is regular OMOP tables and you can do it, reliably, from the Conditions and Modifiers.
That would be wonderful. But apart from the fact that we are far away from having the logic for such algorithms for disease episodes, the question remains: Could they work without the context of the source data? Is this an ETL job, or an universal phenotype-like job?
You seem to be claiming that the Type concept will provide sufficient context for each Modifier (stage, grade, mets, nodes) telling us how much the algorithm should believe it. But there is more trouble lurking:
In other words, all this is so messy that we need to give the ETLer some serious power to make the right choices. That is the point of the Episodes. The analyst using OMOP tables alone would be lost.
Thatâs a good thing. In our workgroup, we are actually debating things.
You realize that all OMOP databases are ETLed, do you? The ETLer has to make a ton of decisions of how to interpret the source data in such a way that it fits the intended representation of the CDM and vocabulary. And no, those decisions are not peer-reviewable. Plus: If anything I am not mistrusting the ETLer like you do. But if algorithms can make her life easier I am all for it.
What is that? And how is that not used for studies? Let me quote @Patrick_Ryan:
True. @Patrick_Ryan made a list and he community voted on them. Where did it get its wisdom from? They are needed for disease setting and outcomes in the studies folks are running all the time.
But you are right, I shouldnât debate the phenotypes here, except whether or not they are the same thing as the episodes.
Back in Phebruary they certainly were. Do you have outcomes that are not conditions now?
Youâd be surprised to hear that from me, but actually if we could create standardized episodes purely from structured data in the OMOP tables I donât think weâd need them. Weâd just use your phenotypes. Episodes only have a life if they need to be populated pre or peri-OMOP.
Thatâs what it boils down to. I hope you and @rimma will be right. Till then, my hope is we can arrive at some mixture: standard algorithms, that are using OMOP tables, but are configured with information the ETLer has obtained from the source data or by asking folks in the institution.
Great end to this focused discussion @Christian_Reich . The key insights i have learnt that reinforced some of positions.
Truly, the use case i think it supports is to algorithmically separate care events that may be unrelated. e.g. if i am getting knee surgery, but during the same days also get a dental workup - the episode table relates the events of knee surgery together i.e. bundle, but does not link to the unrelated events i.e. dental.
It reminds me of old discussion here How to Capture pregnancy data? (EDC, gestation length, etc) - #5 by Gowtham_Rao and the idea of âEpisode of careâ. Health insurance companies have done/been doing/tried to do this for many years.