What is "the magic" in cohort definitions

Eldar · June 27, 2018, 9:28pm

Let’s say there is a subject with multiple drug era’s of chosen drug:
the first era: 2010-01-01 to 2010-05-01
the second one: 2010-05-25 to 2010-06-25

You are taking drug_era_start date as cohort_start_date, limiting the cohort to all events and choosing Cohort Exit Criteria based on eras of persistence exposure.

If Cohort Collapse strategy is not applied (or gap size is less than 24 days) then the cohort will contain 2 rows for this subject with cohort_start and end_dates equal to corresponding drug_era_start and end_dates.

If gap is more - there will be only 1 record:
cohort_start_date = 2010-01-01 cohort_end_date = 2010-06-25

For this case the same result will be returned if Cohort Exit Criteria is not specified (and thus is taken by default as the end of observation period) (unless these drug_era’s are within the same observation_period)

If these 2 drug_eras belong to different observation periods (let’s say from 2009-01-01 to 2010-05-10
and from 2010-05-15 to 2012 - 01 -01) then default Cohort Exit Criteria will still return 2 rows
(they will be 2010-01-01 - 2010-05-10 and 2010-05-25 - 2012-01-01 correspondingly).

And in this case Collapsing strategy (with sufficient gap size) will union it to one row:
2010-01-01 - 2012-01-01
The trick is that Collapsing strategy doesn’t take into account actual observation periods but operates with only pre-generated timelines

Chris_Knoll · June 27, 2018, 9:28pm

Well, the cohort start and end dates can become your time at risk in the Estimation module in Atlas, and it can be your time at risk window inside the incidence rate calculator. If you have an outcome cohort, the outcome will only be identified if it starts within the cohort start and end dates of your Target cohort.

Chris_Knoll · June 27, 2018, 9:30pm

This is probably a bug that shoudl be corrected. collapsing episodes should not cross observation periods…so that’s a good point, and I’ll think about the way to adjust that.

Eldar · June 27, 2018, 9:40pm

I wouldn’t say ‘bug’ but rather ‘feature’
Sometimes it’s reasonable to union periods.

The simplest example - when the outcome of interest is death within 5 years after cohort start and
we want to remove subjects with insufficient time at risk.

But knowledge that there is one more observation period after our cohort_end can definitely tell us that the person was alive.

Eldar · June 27, 2018, 9:42pm

While writing the example I realized that now I’m not sure whether CohortMethod is follow my logic.
So much to review…

Gowtham_Rao · June 27, 2018, 10:56pm

@chris_knoll is it really possible for the collapse strategy to create a cohort with periods outside the observation periods for same subject?

Chris_Knoll · June 28, 2018, 12:47am

@Gowtham_Rao: I need to look at the logic. I’m pretty sure that it doesn’t account for separate observation periods. I think if you have a gap size of 999999, and you have the following episodes:

OP1:<---------------------------------------------->
EP1:                 |-----------------------------|
OP2:                                                     <------------->
EP2:                                                               |---|

Here I’m showing a person with 2 events, no exit strategy, so the episode goes from the event to the end of observation period.

If you take the episodes and extend the end dates by 9999 days to combine them, it works like this:

EP1:                 |-----------------------------|--------------------------->
EP2:                                                               |---|
Yields:
EPF:                 |-------------------------------------------------|
OP1:<---------------------------------------------->
OP2:                                                     <------------->

And you can see the final episode (EPF) spans the observation periods.

I think the solution is to take the final episodes and split them up by finding episode durations that overlap the observation periods like so:

EPF:                 |-------------------------------------------------|
OP1:<---------------------------------------------->
OP2:                                                     <------------->
Yields:
FE1:                 |-----------------------------|
FE2:                                                      |------------|

Here, FE means ‘Final Episode’.

I have some reservations about doing it this way: The time at risk in the second observation period is being driven by some events appearing in the first observation period, and, in the past, there’s been some very strict requirements about containing the events that contribute to a cohort episode all come from the same observation period. However, there is some Themis work that is allowing events to appear outside of an observation period…so it’s possible that it’s OK for a cohort episode to span observation periods.

I’ll have to wait and see what the Themis group comes up with.

Chris_Knoll · June 28, 2018, 12:51am

I think you’re doing great! Your drug era based episodes was spot on.

Gowtham_Rao · June 28, 2018, 1:15am

You are right. I think it is possible because the padding is used in your magic query, and the magic query does not check observation periods.

Splitting by observation period makes sense.

Eldar · June 28, 2018, 2:12am

This topic seems to be a good subject for F2F.

As far as I know some events are now allowed to appear or end outside of an observation period. Observations, however, were allowed even before. (@Christian_Reich,@aostropolets can you please confirm this?)

And here are couple of thoughts that came into my mind:

The majority of study designs I saw are new user studies. Collapsing Strategy does nothing for such ones.
I’d vote for having chance to cross observation periods when collapsing. Moreover, sometimes I’d like to do it even for new users study.
We definitely need to ask community about real world use cases in order to understand the priority of this issue.
@ericaVoss, I guess your question about an example should be now readdressed to yourself

Eldar · June 28, 2018, 2:53am

@Eldar: I just wanted to add: you don’t really have the choice to collapse them. We always do because we must ensure that there are no overlapping episodes in the cohort results. You can decide to change the ‘allowable gap’ between episodes to bring separate episodes together within a gap, but the logic must always remove overlapping episodes.

Found a mistake in my own example:

For this case the same result will be returned if Cohort Exit Criteria is not specified (and thus is taken by default as the end of observation period) (unless these drug_era’s are within the same observation_period)

The result will be actually different : As I assumed common observation period there will be 2 rows with cohort_end_date = observation_period_end_date. And the will finally be collapsed to 1 row.

@Chris_Knoll ,
Thank you for reminding.

Gowtham_Rao · June 28, 2018, 11:34am

I think all CDM data should be still within the subjects observation period. Problem lists without dates were discussed as an exception, but not sure what the final consensus is.

Few days ago there was a similar discussion at a mini face to face session. We talked about padding a subjects time, and the padding would allow the subjects cohort start or end to be outside the subjects observation period. More on it, in another thread can(to be started). So there are other use cases.

aostropolets · June 28, 2018, 1:30pm

@Eldar, @Gowtham_Rao the current Themis proposal is to allow event outside an observation period.
The item is here: https://github.com/OHDSI/Themis/issues/23, and will be implemented into the convention after 5th of July (the end of 60 days review). If you have any comments/ contradictions please share.

SCYou · July 5, 2018, 2:06am

Because of collapse magic,
we should be very cautious when making ‘all-event’ cohort (e.g., all-event pneumonia as a outcome)

Previously, I did not care about ‘cohort_end_date’ for the outcome cohort.
However, every cohort is ruled by ‘collapsing strategy’ now, we need to specify the end_date for the outcome cohort to avoid overlaps of the cohort_periods (usually fixed_date of 0 after the cohort start).

We need to address this issue to others in the tutorial, @Chris_Knoll, @mvanzandt, @schuemie @Patrick_Ryan @msuchard, @Rijnbeek and @Christian_Reich…

(I had never noticed until today… I just thought ATLAS became weird…
If you have noticed this already, sorry for beating a dead horse…)