OHDSI Home | Forums | Wiki | Github

Question regarding incidence/prevalence rates in Atlas

Hi all,

My name is Spyros, I recently joined OHDSI community and work at Roche as a data scientist. I am trying to get a better understanding of how the incidence and prevalence rates are calculated within the ATLAS tool. I was wondering if there is any documentation or a GitHub repo that I can have a look?

Thanks in advance.

There are a few things to know up front about how people are selected for the analysis.

People in T are only included if they do not have the O any time prior. Put another way: people with prior outcomes are excluded.
People in T are followed up until the earliest of:

  • end of time at risk
  • the occurrence of the outcome
  • the end of continuous observation.

With the data prepared, the total follow-up time is calculated, and the number of cases are identified.

The rate is calculated as: cases / total follow-up time.
The proportion is: cases / distinct people

Note, ‘distinct cases’ are assumed because a person can only contribute at most 1 case.

Hi @Chris_Knoll thank you so much for the detailed explanation, it helps a lot

Would it be reasonable to make the exclusion of people with prior outcomes optional? For example if I’m interested in incidence of readmission following surgery (T: surgery, O: inpatient admission) it seems ok to include people in T who have had a prior inpatient admission right?

@Chris_Knoll and @Adam_Black I had your O any O of interest really back in 2003 but your data doesn’t know about as it was back in 2003 before I was part of your data. What would you do with me anyway? Likely you would include me as incident given it’s the first time I had O in your data. Only sparking discussion as I think all of these incident v. prevalent dicisions will depend on the question.

With the CohortIncidence, you can make the prior ‘optional’, in that you can specify a ‘less than infinite clean window’ which will make prior outcomes not exclude all follow-up time after.

I’d like to explain the main feature of the CohortIncidence package and how Atlas actually works this way (just that the user isn’t given the choice).

Atlas works by making an infinite clean window afte rthe first outcome. This means that if you had a prior outcome, all post outcome time is excluded, and anyone with exposure after outcome have all their time at risk excluded. No time at risk = no inclusion in the analysis. This ‘infinite clean window’ also means that follow up time is not counted after the first outcome. In other words, time at risk ends after your first outcome and will not pickup again. The difference with Atlas is that you aren’t givent he choice to have a non-infinite clean window…prior outcomes are excluded and follow up time is excluded after outcomes. I’m being redundant: all follow up time excluded after outcomes means prior outcomes will exclude subsequent people. Notice how there’s no logical way to turn this ‘off’ for priors but then turn it on for post outcomes. It’s a decision that is applied ‘fairly’ to all outcomes.

Turning to CohortIncidence, you have the option to specify a clean window relative to your outcome. You can add a number of days to each outcome which will exclude time at risk from being counted. If a person has no time at risk, then the person is not counted.

So @Adam_Black, there is no option to turn it off for priors and on for post-target-outcomes. You can define your outcomes to only be outcomes if they occur post-target criteria, but you need to ask yourself: what makes an outcome so special that you won’t exclude time if it happens prior, but it will exclude time ater target. But, whatever clean window you specify will be applied equally for all outcomes.

@Kevin_Haynes : in your example, if it happened back in 2003, then the clean window after the 2003 event will not exclude follow up time if you don’t specify a clean window that excludes follow up time in the subsequent years.

Thanks for the reply. I think I understand the behavior of the incidence rate analysis in Atlas. What I don’t understand is why there is no logical way to define incidence using the first outcome during the time at risk and ignore all time prior to the time at risk. This seems like a reasonable analysis to me. A recent example I was helping with was incidence of inpatient hospitalization following surgery.
T: surgery
O: inpatient hospitalization
TAR: 1-14 days after discharge (although we had trouble defining cohort exit = discharge date)
If the person ever had a prior inpatient hospitalization they would be excluded from the analysis in Atlas which is not what the investigator wanted. We can get around this by defining the O as inpatient hospitalization following surgery but that cohort definition gets pretty complicated because the surgery cohort is quite complicated with lots of inclusion criteria. Wouldn’t it make sense for some incidence rate analyses to only consider outcomes during the time at risk as in the readmission example?

I think this happens when the outcome is directly related to the target and should only be considered an outcome if it occurs after the target. You might say that the outcome should be defined as “hospital admission following discharge for surgery X with inclusion criteria …” but that turns out to be tricky to create in Atlas because all the inclusion criteria for the surgery are relative to the surgery date which is not the index for the outcome cohort. I’m guess it is possible with nested criteria though.

Hi, @Adam_Black ,
First let me say: I hear you about the pain points of creating individual O cohorts that are O after T, especially when the T definition is very complicated and makes defining these specialized Os painful.

From an Atlas/WebAPI IR perspective, I don’t think we’re going to be making any changes to it: the next planned change to IR in atlas will be to adopt the functionality in CohortIncidence so that we don’t have these different implementations floating around in our ‘standardized analytics toolbox’.

That being said about Atlas, you are probably OK with doing the work inside the CohortIncidence package. Even tho a new release is pending, you can do what you want today with the current 1.0 release. It will involve a little custom SQL to construct the outcome cohorts that are the first O after T, but it’s not complicated sql to create cohort records based on in a T-O relationship, and you just need to find a way to create a new cohort_id. Here are the steps:

  1. Copy your cohort records from your atlas results table into a new cohort table which will be used to combine your base target/ cohorts and your derived O cohorts that will only have an O if it appears after a T.
  2. Execute the sql that generates the cohort records for a given O with a prior T, and assign it a new cohort_id and put it into the cohort table you set up in step 1.
  3. Put the derived outcome cohort IDs as new outcome definitions in your CohortIncidence design.
  4. Run the CohortIncidence package and review the results.

I recommend using the issue-8 branch on CohortIncidence which makes it much easier to execute the package, and the vignette is up to date.

As for the question on the methods, (ie: the why isn’t it reasonable to just follow to the first O and ignore prior T time at risk and ignore post O time at risk?). I think a new topic should be created in the Researchers section of the forums, but I’ll make one point here to think about it:

CohortIncidence was specifically designed to handle the complicated cases such as the below timeline:

  |S1|             |S2|                  |S3|
  |---------TAR----------|               |---------TAR----------|
|----IP1----|         |---IP2---|

So: TAR begins at surgery, and we can see this patient had 3 surgeries, producing 2 distinct TARs. Note that there was an Outcome IP1 that preceeded the S1, but the IP1 outcome overlapps the start of the first TAR. It’s impossible to see an IP outcmoe inside the IP cohort time so this is called ‘immortal time’, and it shoudl be removed from the TAR because the person is not techinically at risk of an outcome during this period. So: the point is you can’t just ignore prior outcomes.

Secondly, We see that IP2 actually precedes S3 surgery. Should it be ignored because it is a ‘prior to target’? If so, S1 will not show an outcome during it’s time at risk, which is incorrect.

There’s a third surgery that begins some time at risk…and we see here that there’s no outcome after that. Shouldn’t that be considered in the IR calculation? CohortIncidence is striving to incorporate all relevant information into the IR calculation.

These concerns are probably best discussed in the Research section, but I wanted to call them out here so you can think about the implications of measurement error and immortal time bias if we allow options like ‘only follow to first outcome’ and ‘ignore prior outcomes’.


Thanks for your detailed and helpful answer @Chris_Knoll!

@Chris_Knoll, I played around abit with the CohortIncidence package and compare the results with ATLAS. In my understanding, if one specify an “infinite” clean window (set to 99999) in the package, the result should be the same as in ATLAS, right? However the incidence rate is significantly higher with the CI package. PERSON_DAYS is approximate to Time At Risk, but the OUTCOMES is way higher than Cases in ATLAS.
The variables TAR_START_WITH, TAR_STAT_OFFSET, TAR_END_WITH, TAR_END_OFFSET were leave with defaut settings. Am I again missing something here?

I can’t see from your screenshots, but can you tell me how you specified your TAR in Atlas vs. your TAR in CI?
As of v1.0.1 of CI, TAR defaults to using the cohort start date + 0d as the TAR start and cohort end date + 0d as the TAR end. Atlas also uses this same default.

There are other differnces between Atlas and CI that I should mention:

Durations of time are calculated differently:

In Atlas, the time at risk is calculated as datediff(d, start, end) which if start=end, that would result in a 0 day duration, which is why Atlas excludes anyone with the outcome on or before the tar start.

In CI: time at risk is calcuated as datediff(d, start, end) + 1, which means if your outcome is on the same date as your TAR start, we count the person as a case and include 1 day of at risk time for that person.

Looking at your result, I see 3,155,754 people included in the Atlas Incidence analysis, and 3.422.360 people included in CI. Could the difference there be that the additional 300,000 cases come from the additional 300,000 people that were included in the CI analysis that had the outcome on the same day as TAR start? You could looka t your outcome cohort records and your target cohort records to see if anyone has the same date as the cohort_start_date for the T as the cohort_start_date for the O.

As an experiment, you could try to set the CI tar to start on start + 1d, to exclude the first day of the target cohort in the analysis. See how that changes things.

I also considered that it was some problem with specifing the clean window, and it appears there were a couple of bug fixes made in CohortIncidence v1.0.1. Which version are you using?

There is a version of CohortIncidence 2.0.0 that has been released, and you might want to look at that version. The reason for the major version bump is that the API to defining the study changed slightly (different names, and different param values for start/end dates, as well as different result object to reflect the new parameter names). If you wouldn’t be too bothered, you may want to also update your CI to v2.0.0 because if you do identify a bug, I’ll only be making it at the v2.0.0 level, not backwards to the 1.x series.


Setting the TAR_START_OFFSET to 1 indeed brings the no. of outcomes and the Incidence Rate much clocer to what seen in ATLAS, though the numbers are still not exactly the same. I’m using the v2.0.0

It’s me again :smiley: @Chris_Knoll
Let’s say I would like calculate IR for a specified time windows. In ATLAS there is an option where you can add the study window, and I am wondering if one can do something similarly with the Cohort Incidence package?

The study window paramater will be available in next release of CohortIncidence. It will let you specify a study window with 2 fields: startDate and endDate which will specify that the TAR starts within the study window, but it’s an open question if the end date should censor the follow up time to the end of the study window, or if follow up time should continue as normal, and so the only purpose of a study window (in CohortIncidence) is to restrict TARs that start within the study window. In Atlas, it is actualy right-censoring any TAR to the the study window end date. I’m leaning towards the former where we’re just restricting to TARs that start in the study window date range because it has always felt ‘off’ to me that we’d let people come in on the last day of the study window, but then cut their follow up time short to just 1 day (due to right censoring)

Do you have any thoughts on how you’d expect it to work?

@Chris_Knoll, I could imagine that if one would want to study the IR within a specific time window, e.g., compute the annual IR, it makes sense that the TAR shall be censored at the end of that year, doesn’t it?