OHDSI Home | Forums | Wiki | Github

Atlas - When & Why to use Cohort Exit criteria?

Hello Everyone,

I was looking at creating cohorts in Atlas section from Book of OHDSI and have few questions. I also refered few posts in forum on cohort exit as well. can I confirm my understanding of the below items?

I understand cohort exit can be done using 3 ways.

1) End of continuous observation

Ex: A person is in cohort as long as his observation period ends. Simple and straight forward. Irrespective of break in his records for drugs or labs etc he will still be in the cohort. Let’s say subject XYZ has observation period start date as 1/1/2001 and observation period end date as 31/12/2005. So his cohort end date will be 31/12/2005. Right?

2) Fixed duration relative to an event

a) Let’s say subject 123 entered the cohort on 1st Jan 2010 for Hypertension condition or started on Metformin medication. I add 10 days as offset to his initial event (which is Hypertension condition/Metformin medication). So am I right to understand that he will leave the cohort on 11th Jan 2010? But what’s the use in increasing/decreasing/restricting the cohort duration of the patients?

b) Does adding zero days as offset is same as “end of continuous observation”? I know it throws a warning to have no of days to offset as greater than 0

c) Does setting a date offset will help us retain only record per person? Why to restrict his cohort duration?

3) End of continuous drug exposure

a) Should this field be used only if we consider anything related to drugs as Initial events?

For ex: Subject named “Jack” has calcium acetate drug records from 1st Dec 2009 to 12 Dec 2009. Then he again has same calcium acetate records from 27th Dec 2009 to 11th Jan 2010.

I understand if we mention a gap of 30 days, his cohort end date will become 11th Jan 2010. Because it collapses the 2 drug episodes into an era. Am I right?

For ex: Subject named “John” has Metformin from Feb 1st 2010 to Feb 23rd 2010. Then he again has it on 27th Feb 2010 to Mar 15th 2010.

In this case, let’s say I allow maximum window of “0” zero days. So will his cohort end date become Feb 23rd 2010?

But why do we have to pick “Drug concept set” again when we have already defined it in the initial event? Will it not be able to look at it? Because we get the cohort start date from initial events.

I might be wrong here. aren’t we almost always interested in the number of people who satisfy the cohort criteria?. Once I get the list of subjects who satisfy the criteria to enter the cohort, I would anyway be looking at their other data domains to get their corresponding baseline features. How does it matter for us to know how long they were in cohort or when they leave the cohort?

Any simple explanation would really be helpful

HI @Akshay,
The cohort is not only the list of subjects, but also span of time.
You are right, for the cases when you need only some baseline characteristics, end date for the cohort is not required. But think about other analyses: lets say drug safety or effectiveness. In this case you’ll need to find some outcomes after patient entered your initial cohort. And if you think that there is some drug side effect which happens while patient is on drug, then you definitely need to restrict the cohort by end of exposure to this drug.
The other use case is drug utilization analysis (or as we call it - Cohort Pathways) - here you also need to define cohorts as spans of time when patients are using some treatment.
Actually almost all analyses implemented in Atlas require end_date: Incidence Rates, PLE, PLP, Cohort Pathways. Event Cohort characterization need the cohort_end_date for some features.

Speaking about ways of cohort exit:

  1. End of continuous observation:
    Yes, pretty straightforward, the cohort_end_date will be observation_period_end_date using the period in which initial event exists.

  2. Fixed duration relative to an event:
    Imagine you need to define all new cases of pneumonia. One the one hand person can gave >1 pneumonia in his life, on the other hand you often have data where pneumonia codes occur every second day ( claims data especially) How to distinguish actual new case of pneumonia from the follow-up records? Well, we know that typically pneumonia lasts for ~3 weeks, so we can construct cohort like ‘all cases of pneumonia’ (i.e limit initial events to all events per person’) and limit it by 3 weeks from start date (or lets take 4 week to be ABSOULTELY confident).
    By doing this we are separating new case of pneumonia from follow-up records - any pneumonia records which occurred <3(4) weeks after initial one will not be considered as a new case.

  3. End of continuous drug exposure:
    The point is that the choice of drugs for exit logic doesn’t depend on initial event:
    Lets say you would like to have a cohort of hypertension patients and would like to observe them while they are using beta-blockers and/or ACE inhibitors …or asthma patients on inhalant steroids.
    The bullet points:

  • the initial event is disease occurrence.
  • patient can switch from one beta-blocker to another or even change the drug group to ACE inhibitors, and this in our use case still shouldn’t be a signal to terminate persistence of patient in the cohort.
    We can’t use DRUG_ERA table for this cases, as periods are rolled up at the ingredient level, so we need to construct custom periods which depend on drugs of choice.
    This is where you’ll need this option for cohort exit.

Hope this helps


Regarding the following statement: “And if you think that there is some drug side effect which happens while patient is on drug, then you definitely need to restrict the cohort by end of exposure to this drug”

There are also important side effects that occur when a drug is stopped. For example, “thrombotic events” after stopping an anticoagulant, “withdrawal syndrome” after stopping an opioid, “liver toxic lab values” that were captured to assess the follow up of a patient that had an un-captured toxic event during treatment. We need to definitively capture these types of events.


Unfortunately Atlas is not that smart. You cannot create a relative reference in Atlas that says, “look up at that other thing I just specified.” Doesn’t work like that. In each criterion, we need to append the concept set associated to that criteria. So in the case of a cohort exit at the end of continuous drug exposure, you’re creating a piece of logic that says, “A person will be considered in this cohort until the drug exposure of interest ends.” But, technically speaking, Atlas still needs you to append the concept set you used to define that “drug exposure of interest”. Otherwise, its brain is blank. It has no way of knowing “cohort entry used this so we should use this.”

Every logical argument needs the corresponding concept sets appended to it

Akshay, this is the crux of cohort studies. We are working to assemble person-time – which is an estimate of the actual time-at-risk (in days, months, years) – that all subjects contributed to a study.

The cohort exit criteria, as @Eldar points out, signifies when a person no longer qualifies for cohort membership. There are different design choices here because there are different ways to qualify for exiting a cohort. This is exceptionally important. Your cohort exit strategy will impact whether a person can belong to the cohort multiple times during different time intervals.

OHDSI’s definition of a cohort is somewhat unique in that we are not just talking about code lists but also the logic of how to build this in a timeline. I would suggest watching one of the Cohort Definition/Phenotyping Tutorials: https://www.ohdsi.org/2019-tutorials-cohort-definition-phenotyping/. The first segment will walk you through this in more detail.

1 Like

The use case for this function is if you have a person where enters the cohort on a specific ingredient, but then continues int he cohort as long as they are exposed to a certain class. So:

Cohort Entry event: First exposure to Ingredient X
Cohort Exit: End of continuous exposure to drug class Y (Which X belongs to).

I wouldn’t say this is a matter of ‘Atlas isn’t smart’, but rather ‘Atlas will not make assumptions on the designer’s behalf’. So, when you say ‘Exit at the end of continuous exposure’, you need to specify which concept set of drugs you want to use, the tool can’t read your mind.


Fair enough. :wink: I do not mean to offend the Atlas gods.

100% and it’s not its job to either! Our minds are ours after all. :laughing:

No offense taken :slight_smile:

1 Like

Hi Everyone,

Thank you for the detailed response. Just a couple of quick follow-up questions

Can’t we do this pneumonia scenario using the cohort entry events section? I mean with the help of continuous observation period section where I specify the prior window as 3/4 weeks and post window as 3/4 weeks?

As it’s possible that this can result in multiple records per person (because let’s say a person could have pneumonia with a gap of 50-60 days every time), If I could restrict to earliest event per person, then I will get the new cases of pneumonia. Am I right to understand this? if yes, then why do we need Fixed duration relative to en event?

I understand your example on Pneumonia but just trying to learn and correct myself. Might be because I don’t have the breadth of healthcare experience which prevents me seeing the big picture.

Thanks for your help

No, the prior/post setting of ‘continuous observation’ just asserts that for any cohort entry event, there is at least N number of days pre- and prior- to the event. It has nothing to do with cohort exit settings. The only thing you could say about the post-event continuous observation is that if you assert N number of days post observation, then a cohort era will be at least N number of days long (assuming the person enters at the event and leaves at the end of observation). in other words, post-observation is used to require people exist in the data for a minimum amount of time after event.

Well, you might get the earliest event per person, but is it the first? If you say ‘requires 365 days prior’ and ‘earliest event per person’, you are not going to get the first one, because it’s going to give you single event per person where there was the 365d prior observation. There COULD have been an event at day 90, but it didn’t have the 365d prior observation, so it was ignored. It will give you the earliest event that has the 365d prior observation.

Now, how do you know it’s the first? Well, are you comfortable that if you have 365d prior observation before a diagnosis, and there are zero occurrences of this diagnoses anytime before the diagnosis (the first is the one with zero others before)? Is that a good definition of first? Then, yes! you can define it as:

Cohort Entry Event: earliest diagnosis of X with 365d prior observation
Restrict Intial events: having exactly 0 diagnosis of X between all days before and 1 day before index.

So, a person who has a diagnosis on day 366 and 472 will come in at day 366 (earliest)
A person who has a diagnosis on day 90 and on day 366 will not. (The 90 day event doesn’t have 365d of prior observation, and the 366 event does not have exactly 0 diagnosis of this event between all days before and 1 day before the diagnosis).

To answer your second question ‘Why have a fixed duration relative to event?’? Because you may still want to represent the actual period of time the person was suffering from the diagnosis to be represented in the cohort. It’s the difference between ‘this episode lasts 50 days’ to ‘this episode lasted 3 years because there was 3 years of continuous observation after the diagnosis’.

The way cohorts are constructed in Atlas is probably very different than other mechanisms that experienced healthcare professionals have used. Especially if the experience is in prospective studies where you enroll people, follow them as long as possible, and collect data about them. Observational/retrospective studies have different considerations (such as continuous observation) that needs to be accounted for, and so the cohort definition mechanism in Atlas (known as CIRCE) has a lot of features intended to address those considerations.

And, good news, we have test cases for the specifications of the CIRCE cohort expression. If you dare, you can check out the CIRCE repository for some examples on how cohort generation is performed, the example datasets and the expected results. You can find the source code to the tests here.

1 Like

Wow. Fantastic explanation. Spot on. I am amazed. By any chance did you undergo the same difficulty as beginners like me while picking up these concepts because your answers in forum seem to address exactly on points where we have confusion.

Ah, no, I wrote the framework. So, it’s sorta second nature to me :wink:

1 Like

Bumping this because it’s related to a doubt I have (very interesting explanation btw). When you say 365d prior observation, will atlas look for the ‘pneumonia observation’ (as a point in time observation or a point in time inside a condition start/end, but then I don’t know how to understand the prior and after window for some of the tables), to the observation_period, or to something completely different? (maybe eras when looking to condition or drugs)
I’m thinking if being overzealous in the observation_period table creation can hurt in finding some cases

Atlas looks at the ‘pneumonia observation’ start date, and if it requires 365d of prior observation, then it is checking that the date diff between the observation_period_start and the ‘pneumonia observation’ start is at least 365d. when you say ‘post’ observation, then you’re doing the same thing (date diff) but between the ‘pneumonia observation’ and the observation_period_end.

The purpose of the prior/post continuous observation is to ensure that a person can be observed in the data for a duration of time. If you want to assert that a person did not have a prior condition in the 365 days before some observed event, you can’t assert that if you have less than 365d of observed time. For a more concrete example: if you only have 2 weeks of patient records, can you definitely say something about their medical history for the past year? In this example, 365d of prior observation means that you have the past year of patient records.

If you mean that if you are overly aggressive in setting an observation_period start/end, you may loose cases if your cohort definition requires a certain amount of time of patient records but your data doesn’t have it. But this isn’t a bad thing: you shouldn’t make claims that some baseline characteristic doesn’t exist when you don’t have the patient records available to make that claim.

I was thinking about discharge diagnosis and how potentially having them at the end of visits/episodes/observation_period could impact here. I’ll think about it a little more :slight_smile: