Limit cohort expression results

mmmckillop · May 11, 2016, 6:53pm

If I am limiting a cohort expression to the earliest event what does this mean? In addition, how does this differ from the “limit primary events” field?

Chris_Knoll · May 12, 2016, 1:52am

There’s 2 steps in the cohort definition query: selecting the Primary Events, and then (optionally) applying Additional Criteria to each primary event to ‘qualify’ it for inclusion in the cohort.

Let’s say for example you want to say ‘Using the earliest event of Procedure X, Drug Exposure Y or Condition Z per person between 2010-01-01 and 2010-12-31’. Limiting the Primary Events to the earliest event per person means that the earliest of the above 3 would be chosen.

When using the Additional Criteria, you might want to qualify these primary events where whenever any of the primary events occurred, there can be no drug exposure of X1 within 7 days before the event and 7 days after. If Procedure X was the earliest event but didn’t satisfy this ‘additional criteria’, this person would not be selected in the cohort.

However: if Drug Exposure Y was after Procedure X, and Drug Exposure Y satisfied the criteria and you did NOT limit the primary events to the earliest, then Drug Exposure Y will be selected (it’s the earliest of X, Y Z that satisfy the Additional Criteria).

Finally: the final ‘limit cohort results to earliest/latest per person’ means that only use the earliest/latest event from each person if there were multiple events that a person satisfied the cohort criteria.

TL;DR: Limit Primary events means a person is only given one chance to meet the additional criteria (their earliest or latest event) Limit Cohort Results means to only return the earliest/Latest primary event after Additional Criteria are applied.

-Chris

mmmckillop · May 12, 2016, 11:28am

@Chris_Knoll thank you very much.

Eldar · February 22, 2018, 7:22pm

@Chris_Knoll
Can you please clarify what events will be included into cohort if I chose ‘to all events’?
For ex. I want to include all cases of Condition Z per person. I’ve chosen ‘limit initial event to all events’
but after generation I see that overall number of records in the cohort is the same as number of persons and only first occurrence of Condition Z is captured.
I reported such behavior as a bug, but now I’m not sure in it.

Chris_Knoll · February 23, 2018, 12:12am

Hi, I saw your issue in the repo, but this could be more of a education issue vs. a technical issue. So let’s discuss here:

Although you’ve selected all events per person, in the end, the events are ‘collapsed’ into periods of time the person met the criteria for cohort inclusion. Using some ascii art to describe a patient timeline:

-----X------X------X-----------X------X------

The X indicates the re-occurrence of the event you are looking for. Since you’ve set it to ‘all events’, all those events will be kept for purposes of inclusion in the cohort.

If you did not specify ‘exit criteria’, then the person enters the cohort at these events, and stays in the cohort. This makes the ‘period in cohort’ look like this:

                                     -------->
                              --------------->
                  --------------------------->
            --------------------------------->
      --------------------------------------->
-----X------X------X-----------X------X------

So, you can see, without specifying how the person exits the cohort, you have all these overlapping periods, and so these get collapsed into 1 period:

      --------------------------------------->
-----X------X------X-----------X------X------

Now, consider specifying an exit criteria of 10d after event. In the above picture, let’s say there’s a 20d gap between the middle 2 X’s. This is what the periods would look like:

      |---------------|        |---------|
-----X------X------X-----------X------X------

So this would lead to 2 records for 5 events.

If you want each event, then set the exit to be 1d after. This way, you collapse down all same-day events into 1 period, but still get the granular events that you probaby want to see.

hope this helps!

Edit: I’ve updated the diagrams to show that since the exit was set to 10d after the event, the end of the period should be 10d after the last X in the group.

Eldar · February 22, 2018, 11:15pm

Thank you @Chris_Knoll

The behavior described by you is the same as I expected.
Let me present the showcase. At first step I’ve chosen procedure concept that occurs enough much times per person
and designed several cohorts based on it
(concept_id = 2213601 concept_name=‘Current Procedural Terminology version 4 (AMA)90999: Unlisted dialysis procedure, inpatient or outpatient’)

I’ve designed the cohort to be sure that there are patients with more than 1 occurrence of dialysis separated with 14 days of clear period: http://www.ohdsi.org/web/atlas/#/cohortdefinition/1730135
Then I’ve built a set of cohorts:
A) limitation to earliest event per person, end date is chosen by default (end of corresponding observation period) -
http://www.ohdsi.org/web/atlas/#/cohortdefinition/1730132
B) limitation to all events , end date is chosen by default - http://www.ohdsi.org/web/atlas/#/cohortdefinition/1730133
C) limitation to all events, end date is based on fixed period after start date (+1 day) - http://www.ohdsi.org/web/atlas/#/cohortdefinition/1730134

There are 3755 persons with at least 1 occurrence of dialysis (cohort A ) and 3453 patients with at least 2 procedure occurrences with gap of 14 days. Thus, I expect 3755 number of records in
cohort B (this works fine) and almost 2 times greater number of records in cohort C (but counts are the same as in cohorts A and B)

If it’s still education issue, can you please explain me where I am wrong?

Eldar · February 22, 2018, 11:32pm

In other words: I reviewed generated SQL codes for cohorts.
It seems like events should be collapsed somewhere after #included_events. (I guess it’s collapsing in series of steps related to #collapse_constructor_output).
Thus #included_events seems to capture all events. And I am confused that condition

into #included_events
FROM cteIncludedEvents Results
WHERE Results.ordinal = 1

is present. (from my understanding this WHERE clause shouldn’t occur if ‘all events’ are chosen)

Chris_Knoll · February 22, 2018, 11:58pm

Hi, @Eldar,
Looks like you left the last limit to ‘earliest per person’. The initial event limit is properly set to ‘all events’, so every occurrence of dialysis will be used in the qualifying criteria. But, there’s no qualifying criteria, so we then go to the next limit: Limit qualifying cohort to earliest per person. This is the setting just above the ‘Cohrot Exit Criteria’ section in the UI.

Make sure you have all limits set to ‘all events’ for each of the limits. The last one is what is causing you to have 1 record per person.

Chris_Knoll · February 23, 2018, 12:10am

@Eldar: I’ve made a copy of your cohort definition and made the limit change (just wanted to make sure for myself that there’s no bug!) here it is:
http://www.ohdsi.org/web/atlas/#/cohortdefinition/1730136

The persons are 3755, but now records are 17307. This means among the 3755 people, there’s 17307 periods in this cohort. (records per person is about 5:1).

Eldar · February 23, 2018, 12:48am

@Chris_Knoll :
Thanks a lot! Now I understood that limitation of qualifying cohort should also be defined despite there are no any qualifying
criteria. That was no obvious for me, so you saved a lot of my time and Brazilian coffee

Now it seems like issue should be closed. Sorry for raising a storm around this.

Gowtham_Rao · February 23, 2018, 1:29am

Why 1d after (cohort_end_date > cohort_start_date). Why not 0d after (cohort_end_date = cohort_start_date)

Chris_Knoll · February 23, 2018, 1:46am

Two reasons come to mind. Logically: a person who has a start_date = end_date contributes exactly 0 time to the cohort period. If the time doesn’t exist, why put it into the data? Technically: since we added the cohort censor window, we have a piece of logic to ensure everyone in the cohort is within the time window: adjusted_end > adjusted start. This is a very clever trick that saves a lot of complicated logic about removing people from the cohort who fall outside of the censor window, but also has a nice effect of ensuring that everyone’s start date is before their end date.

-Chris

Gowtham_Rao · February 23, 2018, 2:09am

Thank you. I always thought that cohort period is computed as cohort_end_date - cohort_start_date + 1. That would mean 1 day, when cohort_start_date = cohort_end_date.

I havent reviewed it in detail, but why not adjusted_end >= adjusted start.

Chris_Knoll · February 23, 2018, 2:59am

Nope!

That would leave people with 0 time in the cohort, which i thought that logic was an elegant way to remove those cases.

I understand that there’s many different viewpoints on how start/end duration can be calculated in the scientific research community, and there’s good arguments on both sides. But if you’re using the cohort framework, that’s how we’re using start/ends dates, so keep that in mind when defining the cohort exits.

Eldar · February 23, 2018, 11:27am

@Chris_Knoll:
Going back to your cohort design: Population visualization now shows 123675 people . And this number actually represents count of records, not people. This can cause misunderstanding of users.

Christian_Reich · February 23, 2018, 12:13pm

Can you make sure you have the same convention as THEMIS Focus Group 3 is developing for the same day issue? We need to have a single solution. @Asha_Mahesh?

Chris_Knoll · February 23, 2018, 2:48pm

Yes, before we added more robust handling of multiple-events-per-person (via collapse code and exit criteria), the usual form of creating a cohort definition was usually finding the earliest event per person. The visualization then looked very much like a person-level report. But, as you learned about the primary events: these are moments in time where a person qualifies to be in the cohort. Going back to our original timeline:

-----X------X------X-----------X------X------

In the visualization, it’s showing you a ‘total’ value which is the total number of initial events that were found, and then the ‘matched’ value is the final number of events that met all inclusion criteria. Since your example didn’t have any inclusion criteria, the total = matched events because nothing was removed. If I adjust the above example to show events that pass the inclusion criteria:

-----X------P------P-----------X------P------

Let’s say the P’s are the events where the inclusion criteria passes (Ie: there’s an age criteria and a visit criteria which drops 2 of the X’s). In this scenerio, total = 5, matched = 3, and the final cohort cohort periods are going to be defined by only the P events.

But, in the visualization, we’re always talking about the events that allow people to appear in the cohort, so that’s why there is this disconnect between the counts in the vis and the people in the cohort. I think there’s some headings int he visualization referring to ‘people’ but that should be corrected.

There’s also plans to update the report in the visualization to provide a ‘person-centric’ view. The way I’m thinking of this is to find the ‘most matched event per person’ so that you can see how everyone matches the cohort in the ‘best case scenario’ But for now, don’t think of the visualization as a person report, think of it as a report on the total events in the data, and how these events get ratified for inclusion in creating the cohort periods.