OHDSI Home | Forums | Wiki | Github

What is "the magic" in cohort definitions

Chris, your implementation here is an application of a sound software engineering principle - build your solution into data, rather than algorithm. Also, it’s a wonderful example of “relational” programming. My hat is off. Note that application of this principle does not prevent hiding the whole thing in algorithm later, as you imply.

Amazing!

@Chris_Knoll

Question, since this “Cohort Collapse Strategy” is in ATLAS, on the implementation screen it looks like it is always trying to apply it with a default gap day of 0.

But then the Print Friendly says:

If it is set to 0 is it selecting the end of the OBSERVATION_PERIOD_END_DATE or still attempting to collapse with a gap size = 0?

Once the individual cohort episodes are identified (starting with initial events, passing through inclusion criteria, and finally applying exit strategy), you may be left with cohort episodes that overlap. Since overlapping episodes is forbidden, we always collapse the individual cohort episodes into non-overlapping episodes. By default, we allow 0 days of gap between cohort episodes. You can adjust it using the ‘collapse gap size’.

I’m not certain. But it looks like the logic is the next:
Firstly you specify Cohort Exit Criteria where cohort_end_date is defined
And only then the logic of collapsing rows is applied.
Thus ‘Cohort Collapse strategy’ doesn’t touch cohort_end_dates itself.
I.e: you can have 2 or more records for subject with different start_dates (as well as end_dates).
If you decided to collapse them - then these 2 rows will transform into 1 with min(cohort_start) and max(cohort_end)
If no - they will both stay as they are.

The collapse strategy language may be missing from the print friendly, but your editor screen capture is about the collapse strategy and your print friendly screen capture is referencing the end date strategy.

End Date Strategy is use to specify when, after the initial event passes inclusion criteria, how long you should consider the event’s episode before the episode ends. By default, if you don’t specify how an initial event should end it’s episode, then the episode will be defined as from the event start to the end of the event’s observation period.

2 Likes

This is correct, the collapse strategy won’t generate any new end dates. But, if there are multiple episodes that overlap, then you can consider the ‘collapsed episode’ to be the MIN(start_date) and MAX(end_date) of all episodes that are overlapping each other.

@Eldar: I just wanted to add: you don’t really have the choice to collapse them. We always do because we must ensure that there are no overlapping episodes in the cohort results. You can decide to change the ‘allowable gap’ between episodes to bring separate episodes together within a gap, but the logic must always remove overlapping episodes.

2 Likes

So now I’m not uncertain. :grinning:
Thank you

1 Like

Thanks @Chris_Knoll - if I read more carefully I would have seen I was messing up “Cohort Collapse Strategy” and “End Date Strategy”. If my eye traveled down only 100 more pixels I would have seen:

@Chris_Knoll could you give me an example of where I would use this feature. I’m sorry, I know you told me before and now I’m blanking. I was thinking COHORT_END_DATE and that is obviously not right. :frowning:

if you wanted to identify episodes of backpain as a moment that you are diagnosed with backpain, and you assume that if it continues, the’ll have a return visit within 30d with backpain. So you would define your intial events as ‘condition occurrence of ‘Backpain’’, with an exit streategy of ‘fixed date of 30d after event date’. This is just saying ‘Backpain lasts 30 days’. Now if they come in < 30 days for another visit with backpain, you’ll extend the backpain episode another 30d.

What if they come in on day 32? It would be a new episode with the default 0d gap. If you set the collapse gap to 14d then backpain episodes within 14d of each other will collapse into a single back pain episode starting with the first overlapping episode and ending at the end of the last episode (30d after the back pain diagnosis).

Once I have that collapsed time of back pain how/where can I use it in ATLAS?

Let’s say there is a subject with multiple drug era’s of chosen drug:
the first era: 2010-01-01 to 2010-05-01
the second one: 2010-05-25 to 2010-06-25

You are taking drug_era_start date as cohort_start_date, limiting the cohort to all events and choosing Cohort Exit Criteria based on eras of persistence exposure.

If Cohort Collapse strategy is not applied (or gap size is less than 24 days) then the cohort will contain 2 rows for this subject with cohort_start and end_dates equal to corresponding drug_era_start and end_dates.

If gap is more - there will be only 1 record:
cohort_start_date = 2010-01-01 cohort_end_date = 2010-06-25

For this case the same result will be returned if Cohort Exit Criteria is not specified (and thus is taken by default as the end of observation period) (unless these drug_era’s are within the same observation_period)

If these 2 drug_eras belong to different observation periods (let’s say from 2009-01-01 to 2010-05-10
and from 2010-05-15 to 2012 - 01 -01) then default Cohort Exit Criteria will still return 2 rows
(they will be 2010-01-01 - 2010-05-10 and 2010-05-25 - 2012-01-01 correspondingly).

And in this case Collapsing strategy (with sufficient gap size) will union it to one row:
2010-01-01 - 2012-01-01
The trick is that Collapsing strategy doesn’t take into account actual observation periods but operates with only pre-generated timelines

Well, the cohort start and end dates can become your time at risk in the Estimation module in Atlas, and it can be your time at risk window inside the incidence rate calculator. If you have an outcome cohort, the outcome will only be identified if it starts within the cohort start and end dates of your Target cohort.

1 Like

This is probably a bug that shoudl be corrected. collapsing episodes should not cross observation periods…so that’s a good point, and I’ll think about the way to adjust that.

1 Like

I wouldn’t say ‘bug’ but rather ‘feature’
Sometimes it’s reasonable to union periods.

The simplest example - when the outcome of interest is death within 5 years after cohort start and
we want to remove subjects with insufficient time at risk.

But knowledge that there is one more observation period after our cohort_end can definitely tell us that the person was alive.

While writing the example I realized that now I’m not sure whether CohortMethod is follow my logic.
So much to review… :pensive:

@chris_knoll is it really possible for the collapse strategy to create a cohort with periods outside the observation periods for same subject?

@Gowtham_Rao: I need to look at the logic. I’m pretty sure that it doesn’t account for separate observation periods. I think if you have a gap size of 999999, and you have the following episodes:

OP1:<---------------------------------------------->
EP1:                 |-----------------------------|
OP2:                                                     <------------->
EP2:                                                               |---|

Here I’m showing a person with 2 events, no exit strategy, so the episode goes from the event to the end of observation period.

If you take the episodes and extend the end dates by 9999 days to combine them, it works like this:

EP1:                 |-----------------------------|--------------------------->
EP2:                                                               |---|
Yields:
EPF:                 |-------------------------------------------------|
OP1:<---------------------------------------------->
OP2:                                                     <------------->

And you can see the final episode (EPF) spans the observation periods.

I think the solution is to take the final episodes and split them up by finding episode durations that overlap the observation periods like so:

EPF:                 |-------------------------------------------------|
OP1:<---------------------------------------------->
OP2:                                                     <------------->
Yields:
FE1:                 |-----------------------------|
FE2:                                                      |------------|

Here, FE means ‘Final Episode’.

I have some reservations about doing it this way: The time at risk in the second observation period is being driven by some events appearing in the first observation period, and, in the past, there’s been some very strict requirements about containing the events that contribute to a cohort episode all come from the same observation period. However, there is some Themis work that is allowing events to appear outside of an observation period…so it’s possible that it’s OK for a cohort episode to span observation periods.

I’ll have to wait and see what the Themis group comes up with.

1 Like

I think you’re doing great! Your drug era based episodes was spot on.

t