OHDSI Home | Forums | Wiki | Github

Use of future events in cohort inclusion criteria

On the topic of phenotyping, when, if ever, is it appropriate to use events that occur after the index date in a cohort’s inclusion criteria? I see this in practice but have heard plenty of advice in OHDSI tutorials against it. I think that in PLP or PLE studies it should probably never be done. Is it ever appropriate in characterization? An example would be comparing baseline characteristics of persons who were discharged and readmitted to a hospital vs. persons who were discharged and not readmitted.

For a provocative answer, Never. Suissa S, Dell’Aniello S. Time-related biases in pharmacoepidemiology. Time-related biases in pharmacoepidemiology - PubMed Using the future to characterize the past is a challenge. If I know that after my AMI I will live to pick up the beta blocker prescription on Monday, I will spend the weekend extreme skiing without a helmet.


There is nothing wrong in using future information in building a cohort definition -as long as that is what you want to phenotype. e.g. if you want to phenotype a cohort indexed on first diagnosis with a requirement that they got atleast three dispensations of insulin every 3 months in the first year after diabetes diagnosis - then yes, you can build that cohort definition.

But - if your intent is to learn about what happens in the first three months (characterize) to a persons newly diagnosed with diabetes mellitus - requiring that those persons had such three dispensations would not be appropriate.

Rephrased - like @Kevin_Haynes pointed out - using that phenotype to generate evidence is most likely biased

1 Like

Suppose we have a large library of standard cohorts and a set of standard analytic methods and we do a lot of plug and chug analyses by combining many different cohort combinations with each analytic method. It seems like using future information in the cohort definitions of such a system will lead to hard to catch biases. For an unrealistic but still informative example - the day 1 Type 2 diabetes cohort here requires that persons have at least one diagnosis of T2DM on or within 365d of the index date. If we compared survival time of persons in that cohort with a T2DM cohort that did not include future information we’d have the problem of immortal time bias right?

Let me add another idea. For example: if a person has a diagnosis today and receives several treatments in the future – well, that cohort definition is more likely to be more specific for the condition of interest (the recurrent future treatment suggests TRUE positive).

On the contrary - if a person has a diagnosis today and NO future treatments for that condition - then you may argue that set of persons are less specific compared to the above cohorts.

Well - now we have two cohort definition that are trying to capture the same clinical idea (phenotype) - and we know that one of above (one with treatment in future) is more specific and has higher TRUE positivity compared to the second.

So - what are the baseline population level characteristics of the two cohorts and how are they different! If they are different, does that inform us - about what can make a better cohort definition.?

This is the foundational idea of @jswerdel silver standard approach for PheValuator!

1 Like

Yes - with limited to no exceptions - we cant be doing that type of evidence generation

Also this discussion @Adam_Black

This morning there was a discussion about immortal time bias - started by @Adam_Black here Use of future events in cohort inclusion criteria - #6 by Gowtham_Rao and @Kevin_Haynes pointed to time-related biases discussed here Time-related biases in pharmacoepidemiology - PubMed

Problem here is that these are two different cohorts. They are both capturing patients with the condition of interest but the former (and maybe the more specific one) only capture a subset of patients who survived and made it to the treatment (sounds like immortal time). In certain circumstances, this might be negligible although still problematic. If presence of treatment after diagnosis is the definition of interest, the index date/cohort entry event should account for the immortal time, here using treatment initiation as an index date. But again, the interpretation of the cohort is different than a newly diagnosed cohort.

At the same time, I believe that it is completely ok and also necessary to have different definitions with different (but acceptable) level of specificity and sensitivity per use cases. Let’s take lung cancer as an example. Up to 20% of patients with stage IV lung cancer do not receive treatment. Creating a cohort of stage IV lung cancer based on receipt of future treatment, although more specific, would exclude those 20% who are different in many aspect of disease or access to care and also outcomes. So a completely different cohort from a cohort of stage IV lung cancer patients. If our goal is to characterize lung cancer patients, using a definition based on future treatment will only lead to biased results. But the same definition (with the appropriate index event) would be the go to if the interest is characterizing stage IV lung cancer patients initiating treatment.

I want to second what @Kevin_Haynes said: a cohort definition should never include future event. We avoid it by all means, even if we think we can avoid the pitfalls. We will fall into them.

1 Like

i dont think this is a debate about immortal time - immortal time is real and we need to avoid it. i.e. we should not be using future information to build cohorts.

The exception we are discussing not to generate evidence - but to inform phenotyping. We know that persons who receive treatment in the future are more specific. So - what are the attributes of these people? Can we learn from that - and improve the specificity of our current cohort definition (by only applying rules to the baseline data and not future data).

Correct but my concern with this is that you are learning about attributes of people who survived up until that point in the future and not all people with the condition. So applying those attributes to the baseline data will give you a biased definition.

You definitely have a point there - but is the mortality/loss to follow-up persons significant enough that they will move the population level summary characteristics?

How about if we change our future cohort to - persons who receive future treatment or have atleast two more visits with an oncologist in the future, and this reduces from 20% loss to 5% loss. Is that acceptable.

Upon having some questions about cohort inclusion criteria in a study, I was looking into older threads and found this discussion so wanted to revive it.

My question would be whether it would be acceptable to set up the timeframe for an inclusion criteria around the index date (e.g., 30 days before and after index), if I am looking at two occurrences of a lab measure within that time period and want to ensure stability in this marker. The intention to include the confirmatory measure after index would be to take the most recent value possible and closest to the index. I would really appreciate your input with regards to that and in case you have other suggestions to include patients with stable lab measures around the index.