Episode Event

In my experience, a barrier for using the oncology extension episode tables is the way the episode_event table works. This table links events to the episode. This is done indirectly via the ‘field_concept_id’ that identifies the event field and the ‘event_id’.

Both ETL implementation and analyses would be more straightforward if the event tables have an ‘episode_id’ field similar to having a ‘visit_occurrence_id’ field. As we are now moving to implement the episode table in the OHDSI tools (i.e. Circe/Atlas and FeatureExtraction), I want to make a last effort to see if we can make this change.

Is there a benefit of keeping the episode_event table that I am missing (except for not breaking local code).

Tagging @agolozar and @Christian_Reich for awareness

Yeah, the EPISODE_EVENT table is ugly. It’s a one to many connector, in contrast to the visit_occurrence_id, which is one to one.

We are planning on cleaning up and pressure-testing the oncology convention this year, @MaximMoinat. Please come to the relevant Oncology WG meetings. This should be one of the subjects, and your experience with real use cases would be valuable.

1 Like

We have delivered treatment episodes & disease progression episodes at our site, so database design principles notwithstanding, we have a pretty significant sunk cost in such a big breaking change here, as there is a lot of downstream tooling built around it, so would be very keen to follow this conversation.

In case it is helpful, I’ve got some python libraries that do abstract away some of this complexity

specifically these ones may be of use:

ep = session.query(EpisodeView).first()
>>> <Episode 1: 32533 (2020-10-06)>

ep.episode_concept.concept_name, ep.episode_object_concept.concept_name
>>> ('Disease Episode', 'Squamous cell carcinoma, NOS, of branchial cleft')

events = (
    session.query(Episode_EventView)
    .filter(Episode_EventView.episode_id == ep.episode_id)
    .all()
)

# polymorphic relationship to clinical fact tables can be context aware and resolved dynamically
events
>>> [
>>> <EpisodeEvent ep=1 Condition_Occurrence#1>,
>>> <EpisodeEvent ep=1 Measurement#1>,
>>> <EpisodeEvent ep=1 Measurement#2>,
>>> <EpisodeEvent ep=1 Measurement#3>
>>> ]

there’s a couple of other potentially relevant demos here - happy to add in a couple of other ones specific to how we have been handling the polymorphism if there’s any interest…

2 Likes

Thanks both for these perspectives. It would indeed be a breaking change. Note that Python is not commonly used in OHDSI, so to make use of the existing analytical packages the implementation has to be written in R or SQL. Do you know if any sample queries exist to ‘normalise’ the episode_event table?

@Christian_Reich I did not consider the many-to-many aspect. Although I expect that in practice most events will be linked to one episode. Maybe @gkennos can provide their input here; have you had instances where e.g. a diagnosis or treatment was part of multiple events?

I kind of feel we need to rethink the whole Episode situation. What we want from it, what the data supports. Yes, @gkennos seems to be ahead of us in accumulating experience. Let’s take one of the WG sessions to figure it out.

The handling of the increased complexity brought about by these kinds of enriched relationships is one of the reasons I’m quite focused on making tools for reasoning in Python more readily available. I find that it allows us to reason explicitly about assumptions that are otherwise implicit (or even hidden) in SQL-only representations.

There is a lot of potential in these kinds of grouping to episodes levels that can help align data from different sources (e.g. clinical / registries) by supporting summarisation at different levels depending on what is the best information available, so support for some flexibility is definitely required.

@MaximMoinat wrt “normalisation”: if by this you mean materialising an episode-centric view that looks more like visit_occurrence_id for downstream analytics, then I guess that kind of matches up with how we’ve handled this by building episode-scoped views that resolve the polymorphic episode_event links. The queries to do that aren’t exactly complicated, though they are certainly not aesthetic when handled directly in sql with a bunch of case statements. Exposing simpler objects has felt like a reasonable compromise for us so far.