OHDSI Home | Forums | Wiki | Github

[2022 US Symposium] #66 - Episode and Episode Event Tables Documentation

Please use this spot to document:

  1. Describe the issue/topic?
  2. What do we know about this topic? What has been discussed?
  3. What are recommendations for how to handle this issue/topic?
  4. What next steps should be taken?

Related posts:

this what we come up with so far:

Describe the issue/topic?

The EPISODE table aggregates lower-level clinical events (VISIT_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, DEVICE_EXPOSURE) into a higher-level abstraction representing clinically and analytically relevant disease phases, outcomes and treatments. The EPISODE_EVENT table connects qualifying clinical events (VISIT_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, DEVICE_EXPOSURE) to the appropriate EPISODE entry. For example, cancers including their development over time, their treatment, and final resolution.

The EPISODE_EVENT table connects qualifying clinical events (such as CONDITION_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, MEASUREMENT) to the appropriate EPISODE entry. For example, linking the precise location of the metastasis (cancer modifier in MEASUREMENT) to the disease episode.

These tables are available in OMOP CDM ver. 5.4 only and are not implemented into the ATLAS yet. They can be a good alternative to the FACT_RELATIONSHIP table (?)

What do we know about this topic? What has been discussed?

  • Not one event, but several subsequent events: drug therapy, pregnancy, oncology (progression, remission, lines, cycles, regimens)
  • Can be overlapping and nested (parent – child episodes), e.g. Disease episode - subsumes - drug episode, procedure episode, disease progression etc.
  • Types of an episode: disease episode, drug episode, etc.
  • We should pick up all events which belong to an episode of interest logically (visits, conditions, measurements, procedures, devices, observation)
  • Usage: therapy regimen, disease status changes
  • Therapeutical areas: cancer, chronic infections (tuberculosis, Helicobacter pillory infection), psychiatric diseases (scales results can say about disease progression), other chronic disorders
  • Example of a use case: a disease is changing in time as well as therapy applied
  • Atlas use case: “Show me all patients with breast cancer on a 2nd line of therapy with progression
  • Disease episode is not predefined, anything can be linked with it (only if we can guarantee that all related entities are connected to its disease in a source), while a treatment episode is predefined (regimen has a name)
  • Drug episode is relevant for chronic diseases like most of Episodes in general

What are recommendations for how to handle this issue/topic?

  • If there are no episodes defined in the source tables? If you have it in the source consistently – put it in. If not – do not even start doing this as it is a huge analytical work, except of usage of OncoRegimenFinder that groups drugs from Drug_exposure into Regimen. @agolozar , do we have a newer version?
  • Use that system based on your source data, only when you have the ability to do it comprehensively per a disease, i.e. you can build episodes for all occurrences of multiple myeloma in a given database
  • ETLer will not abstract the data to creat episode entries, except of Regimen finder.

What next steps should be taken?

  1. To answer a question: do we want to put drug combinations (when several drugs are used to treat a given condition, but not making a regimen), for example, combination of antibacterial and anti-inflammatory to treat an infection, into the EPISODE table?

  2. To improve and publish convention

  3. To expand Episode usage to not-cancer disorders both for disease and treatment episodes

  4. To get more use cases


Most of the issues you raised will be described in the upcoming comprehensive documentation about EPISODE population for oncology. To highlight some conventions that address your questions:

  1. No data should be extracted directly from the source into EPISODE. Population of the EPISODE should be done post-ETL, very similar to population of ERA tables.
  2. Since initial target of episode information is the regular OMOP tables (conditions, procedures), we introduced episode-related modifiers like ‘Initial diagnosis’.
  3. The same episode (e.g. episode of disease progression) can be derived using the modifiers if available or algorithmically. Therefore, preserving provenance of episode derivation is key to understanding validity of the data and comparing different methods.
  4. Yes, we want to reflect regimens that are not available in HemOnc proper in EPISODE. Custom regimen concepts should be created for this purpose.

Stay tuned for more conventions + greater details in the documentation.

1 Like

How is it possible with drug Regimen? only with regimen finder?
What if I have the source let’s say “lines of therapy” table where all regimen are stored? this table will give me more comprehensive results than the Regimen finder, because it has custom regimen, the RegimenFinder can’t find.

How do I connect events to a disease episode? each procedure or observation will have the modifier as well?

Most of the issues you raised will be described in the upcoming comprehensive documentation about EPISODE population for oncology. To highlight some conventions that address your questions:

Are these strictures for Episode in general, or only for Episode with respect to oncology?

Head on, @roger.carlson.

We need to distinguish between episodes used in oncology, and episodes used elsewhere. @Dymshyts’s nice summary would be split along those. Let’s pressure test them all. By pressure test (you guessed it) I mean can we think of an analytical use case and elaborate on it:

General thoughts

  1. Do we indeed have two types of episodes: disease and treatment episodes? Or are there more?
  2. What types of events belong to which: (i) anything to disease (visits, conditions, measurements, procedures, devices, observation), and (ii) only drugs and procedures to treatment episodes?
  3. Are disease episodes defined by a any standard condition concept? Or are there special ones (e.g. tumor types)?
  4. Are treatment episodes defined by a predefined regimen concept? Or can they be made up for each database separately?
  5. Are we looking for the “Closed World” rule, by which if we cannot guarantee a comprehensive set of episodes we should write not write any?
  6. Are inference tools like the OncoRegimenFinder good enough to declare episodes comprehensively? Are inferred episodes generally considered equivalent to source-based ones?
  7. Is the comprehensiveness rule limited to within each episode_object_concept_id (the target condition or regimen), or is it general across all disease or treatment episodes? In other words, if I can’t write all cancer episodes should I write none? Or is it enough if I can write all breast cancers?
  8. Generally, can we expect the poor ETL wretch to figure this all out?

Oncology episodes

  1. Disease episodes are abstracting from the fact that cancers are dynamic (grow/shrink, change their malignancy, etc.) Treatment episodes are abstracting from a series of treatment events following a pre-defined dosing and schedule. Are these the sole the use cases?
  2. Disease and treatment episodes are the result of abstraction, either at the source or post-ETL. Should we wait till we have those abstraction methods worked out before ratifying episodes?
  3. Disease episodes can be nested into each other, and treatment episodes can be nested into disease episodes. Are there other logical scenarios that actually happen in clinical practice?

Non-oncology episodes

  1. Disease episodes could be considered in some therapeutic areas, such as chronic infections (tuberculosis, helicobacter pylory infection), psychiatric diseases, maybe other chronic disorders. Is that it? Do we have the use cases?
  2. Treatment episode could be considered for regimen-line drug cocktails as in HIV and tuberculosis. Is that correct? Do we have the use cases?

I think that’s it. Again, if we don’t have the use cases we should not do it.

Some thoughts:

  1. We definitely need episodes in oncology. Cancer patients generate a huge volume of healthcare activity all related to a single primary tumor and we need that organizing concept to differentiate between different tumors. And many cancer treatment guidelines, drug labels, clinical trial eligibility criteria, and outcomes measures depend on how may lines of therapy the patient has received. That’s much easier to handle with treatment episodes than with dozens or hundreds of (incomplete) medication administration records. We also have special semantics around treatment holidays (patient paused treatment due to toxicity, due to surgery, due to other illness, due to travel, etc.) and the cancer treatment episode heuristics can accommodate this. We can also annotate treatment episodes with whether they were completed - when analyzing comparative effectiveness (and not compliance) we only want to include patients who actually completed the treatment. Again, very difficult with individual medication admin records. I agree it’s not ideal that the heuristics for cancer will be different from other condition and treatment episodes. But cancer condition and treatment episodes seem so useful to me that I wouldn’t discard them due to a “closed world” desire.

  2. I agree it will be hard to draw the line on which conditions should be represented with episodes. Clearly sore throat or headache shouldn’t be episodes, but persistent or recurrent infection, chronic pain, and type 2 diabetes are conditions that may resolve but are long enough for an episode concept to be use cases, especially in combination with treatment episodes to inform phenotypes like “staph infection unresponsive after three lines of therapy”. Of course, even in cancer determining the end of an episode is still aspirational (we don’t have structured outcomes) and is likely even more challenging in other conditions. The EHR seldom tells you when a condition is cured. But the existence of a potentially identifiable end to an episode may be a determinant of which conditions are candidates for episodes. An example in oncology is an imaging report indicating no evidence of disease.

  3. The condition (and maybe treatment) episode extraction feels connected to phenotypes. The “communities” defining high-quality phenotypes are also the people who can state the heuristics for condition and treatment episode extraction/construction for their phenotypes, and they could produce RegimenFinder for their treatments(s). They are also the people who can determine whether episodes are useful in their conditions at all. Oncology is just the first community to encounter the need. Communities also relate to comprehensiveness; a good example is solid tumors (described above) versus heme malignancies in which researchers might disagree with the solid tumor concept of a condition episode. But as heme builds an OMOP community they can develop phenotypes and (optionally) episode heuristics in parallel.

  4. I don’t see it explicitly in your questions, but I realize this leads to a CDM containing episodes with many different semantics/heuristics, and what does that mean for “universal” analysis tools?

It’s late and I think I’m being too verbose…

1 Like

The reason for making it a two-step is to preserve available regimen or other episode data at the level of regular OMOP tables (low level events). Presently, OHDSI tools are not using EPISODE. Therefore, regimen or other episode related information cannot be surfaced by OHDSI tools unless stored in regular (vanilla, low level event) tables.

These are conventions we established for now for storing episode related data in regular OMOP tables:

  • For disease: using modifiers like ‘Initial diagnosis’
  • For regimens: decomposing a regimen during ETL into its individual drug components and storing them in the DRUG_EXPOSURE table.

@jmethot , thank you for clearly outlining the need for episodes in oncology analytics. They are key for answering most common cancer research questions. This has been established.

As for episode derivation, it is a rapidly evolving area. On one hand, there are many research registries in oncology that manually abstract this information. One of the most prominent and reliable is US Tumor Registry. We will be definitely leveraging these sources. On the other hand, there are evolving deterministic and probabilistic algorithms that derive episodes from low-level events. None of these methods are perfect or conform to the same rules and conventions. However, they have been used outside of OHDSI and will continue to develop.

Therefore, we introduced the foundational structure for persisting episodes and initial set of conventions for episode population/derivation. This platform enables testing different methods of episode derivation and using them in analysis.

Pregnancy is an episode. And different than the chronic diseases.

Tagging @acallahan @louisahsmith

1 Like

@MPhilofsky , when you say ‘pregnancy is an episode’, can you help me understand how you would differentiate a ‘pregnancy episode’ from a ‘pregnancy cohort’ entry? I ask because we are actively working on phenotypes and cohort definitions to represent the span of time that a person belongs to a health state, and pregnancy is one of the specific use cases we have been using for cohorts, where we’d like for a cohort start date to be at conception and a cohort end to be when the pregnancy outcome is observed. There can be cohorts for specific pregnancy outcomes (e.g. livebirth, stillbirth, abortion) and also a composite cohort that combines all pregnancy outcomes to represent the collective spans of time that a women is in the ‘pregnant state’.

This is an excellent question, @Patrick_Ryan. What is the difference between cohorts and episodes.

I would claim, let’s see if this flies: Both are memberships of patients in something defined by criteria and have a start and end. The difference is


  • Potentially infinite number
  • Not an ETL job since highly variable and dependent on the study
  • Built on top of a complete OMOP CDM
  • Criteria universal (at least that’s what @Gowtham_Rao is trying to accomplish)


  • Finite predefined list
  • ETL job
  • Built from source data and/or OMOP converted data
  • Criteria dependent on source data

Does that make sense?

@Christian_Reich - lets say we have a standard repository of peer reviewed algorithms (version controlled and enumerated) that can run on the core CDM tables (visit_occurrence, condition_occurrence, drug_exposure etc) and create an output that is an episode as defined as continuous span of time the person had the episode

and we export these algorithms with every installation of OHDSI software (i.e. for example, atlas would ship with it built in)

would that address these concerns

  • Not an ETL job since highly variable and dependent on the study

now, it is not study specific. it is not highly variable infact it is standardized.

  • Finite predefined list

This repository of algorithms (lets call it OHDSI Phenotype library) would be a finite list. It would be credible and trusted.
Would be consistent across OHDSI network
Would be built from OMOP converted data


Looks like you have a stake in this, don’t you? :slight_smile: Well, let’s dissect.

Stop right here. Just like ETL, I claim you cannot build Episodes on the OMOP CDM standard tables alone. You need the source. Reason: Depending on what the source captured, the definitions would differ. For example, take the episode “Progression”. You can get that from an abstracted record in a tumor registry, or from the path lab, imaging report or clinical record in an EHR (which may have to be NLPed out or is in some kind of structured place).

Also, it cannot be peer reviewed. Peers cannot see the source. Episodes are built against a set of requirements.

Very nice idea, but that doesn’t make it part of the CDM. That’s a convenience thing.

Well, come on. The cohorts are made for the studies. In your head, as you come up with the content of the library, you abstract from a multitude of typical studies and standardize to that. But there are potentially an infinite number of cohorts you create and standardize. Or is the current list everything you could ever need?

Also, your standard cohorts are always Conditions. You don’t make cohorts for other domains. Why? Because we cannot rely on the diagnostic codes for reliable condition cohorts. They are overreported and underreported, their timing stinks, and their definition may not match what you need. So, you do all the gymnastics to get around those shortcomings (without making transparent what gymnastic move is addressing what issue, as I have previously complained).

Episodes have a different purpose: They are abstracted conditions as well, but they also describe the dynamic nature of the disease, and they organize complex treatments.

Finally, the episodes keep the connection to the events they are built from, or they are related to (EPISODE_EVENT table). Cohorts do not.

@Christian_Reich and @Gowtham_Rao ,
As I described in this thread, we introduced a list of condition modifiers (e.g. 734306 = ‘Initial diagnosis’) that supports preserving available source information related to episode definition. We have already added these modifiers to the ETL from Tumor Registry. We are also recommending this for ETL of any data that has insights about episodes. This data along with concept_type_id will be used for post-ETL derivation of episodes. Therefore, you can and should build Episodes on the OMOP CDM standard tables. Moreover, before we develop any tools that use Episodes, these modifiers can be also surfaced by available tools.

@Christian_Reich and @Gowtham_Rao, we are developing algorithms that are based solely on regular OMOP tables (e.g. Regimen Finder is based on DRUG_EXPOSURE). Therefore, they can be peer-reviewed. I have proposed an idea very similar to @Gowtham_Rao for a repository of algorithms where each algorithm will have a concept assigned. This will allow for preserving provenance of episodes in Episode.Episode_Type_Concept_ID.

Looks like @rimma disagreed with you. Also sounds like a claim that is not substantiated. So lets strike this off.

What are we talking about? changing data representation (i.e. converting to OMOP form) or deriving new events (i.e. using some intelligence to generate new data from other data using an algorithm) I am only interested in the later i.e. algorithm. I do not want to empower the ETL’r to do this - as then you are giving ETL’er too much power to interpret source data and derive new data and this is NOT good for reproducible research (e.g. ETL rules may be unknown, have errors). By using cohort approach - we are making them all available.

Lets strike this off - same argument as above i.e. derivation is not from source data but from core OMOP CDM tables.

Incorrect @Christian_Reich - i think few years ago that was the correct, but right now - we are thinking (we = OHDSI Phenotype Development and Evaluation workgroup) are thinking of a ‘target’ and cohort definitions are algorithms trying to identify cohort that match that target. Any deviation from target is the error. An individual study is not a component of Phenotype Development and Evaluation. Happy to discuss that - come join the workgroup :slight_smile:

I dont think thats correct either @Christian_Reich

Finally, the episodes keep the connection to the events they are built from, or they are related to (EPISODE_EVENT table). Cohorts do not.

All valid points, and use of Episode tables is valid. My position is not whether we need to populate/use Episode table. I am arguing against the ETL’r making undocumented/unreproducibile algorithmic choices to populate a table with derived content. If you want to just do a ETL of source to target - sure go for it, but the T should be minimal and have record level referential integrity to the source where possible.

If instead you have an algorithmically derived summary of multiple records in source, especially if it is running on source tables - then I think thats not good!

Ok - Re -reading all posts from top - these are derived tables and NOT clinical data tables.

and it has fields that are not in cohort table like following

These tables are to be calculated from omop core tables/clinical data tables and NOT source tables.

So - now some of the arguments makes sense…

but we will have the age old problem of not being able to fully trust these derived tables. e.g. in OHDSI network studies, we rarely seem to use condition_era, drug_era – but use the condition_occurrence and drug_exposure. this is the reason for THEMIS

Nice debate here. But it is getting long. Let me see:

Correct. If the source tells us what the episodes are we are all set. But the debate is about derivation when you don’t have that, à la @Gowtham_Rao’s phenotypes. He claims all you need is regular OMOP tables and you can do it, reliably, from the Conditions and Modifiers.

That would be wonderful. But apart from the fact that we are far away from having the logic for such algorithms for disease episodes, the question remains: Could they work without the context of the source data? Is this an ETL job, or an universal phenotype-like job?

You seem to be claiming that the Type concept will provide sufficient context for each Modifier (stage, grade, mets, nodes) telling us how much the algorithm should believe it. But there is more trouble lurking:

  • What about unstructured imaging and path lab reports?
  • What about contradictions between EHR and registries, or contradictions between different source information?
  • What about incomplete information? For example, some ambulatory clinic will record chemotherapy, but it will not record surgery in a adjuvant or neoadjuvant setting, or autologous stem cell transplantation. Similarly, administration of oral chemotherapy is often organized differently than parenteral.

In other words, all this is so messy that we need to give the ETLer some serious power to make the right choices. That is the point of the Episodes. The analyst using OMOP tables alone would be lost.

That’s a good thing. In our workgroup, we are actually debating things. :slight_smile:

:slight_smile: You realize that all OMOP databases are ETLed, do you? The ETLer has to make a ton of decisions of how to interpret the source data in such a way that it fits the intended representation of the CDM and vocabulary. And no, those decisions are not peer-reviewable. Plus: If anything I am not mistrusting the ETLer like you do. But if algorithms can make her life easier I am all for it.

What is that? And how is that not used for studies? Let me quote @Patrick_Ryan:

True. @Patrick_Ryan made a list and he community voted on them. Where did it get its wisdom from? They are needed for disease setting and outcomes in the studies folks are running all the time.

But you are right, I shouldn’t debate the phenotypes here, except whether or not they are the same thing as the episodes.

Back in Phebruary they certainly were. Do you have outcomes that are not conditions now?

You’d be surprised to hear that from me, but actually if we could create standardized episodes purely from structured data in the OMOP tables I don’t think we’d need them. We’d just use your phenotypes. Episodes only have a life if they need to be populated pre or peri-OMOP.

That’s what it boils down to. I hope you and @rimma will be right. Till then, my hope is we can arrive at some mixture: standard algorithms, that are using OMOP tables, but are configured with information the ETLer has obtained from the source data or by asking folks in the institution.

Great end to this focused discussion @Christian_Reich . The key insights i have learnt that reinforced some of positions.

  1. Episode table is a derived table. It is derived, like condition_era, drug_era, from the core clinical CDM tables.
  2. Although built during pre-processing/set-up of the CDM, episode table do not interact with source data in any form. i.e. it is a phenotype like algorithm.
  3. The output of the phenotype like algorithm is different from cohort i.e. its more than subject_id, cohort_start_date and cohort_end_date and includes elements that a cohort algorithm would not support.

Truly, the use case i think it supports is to algorithmically separate care events that may be unrelated. e.g. if i am getting knee surgery, but during the same days also get a dental workup - the episode table relates the events of knee surgery together i.e. bundle, but does not link to the unrelated events i.e. dental.

It reminds me of old discussion here How to Capture pregnancy data? (EDC, gestation length, etc) - #5 by Gowtham_Rao and the idea of ‘Episode of care’. Health insurance companies have done/been doing/tried to do this for many years.