Denominator cohort for incidence rate analysis

siir · August 22, 2023, 10:03am

Dear OHDSI community,

I wanted to get your opinion on a phenotype definition (numerator cohort) and defining at-risk population (denominator cohort) for calculating incidence proportions and rates.

The phenotype is heavy menstrual bleeding (HMB) and I have used a similar definition to describe HMB as in PhenotypeLibrary definitions. There are a few changes made to this definition: hysterectomy and bilateral ovariectomy were added as an exclusion since these patients would not be experiencing HMB in the follow-up period, patients with records of postcoital bleeding were also excluded with specific time window during baseline.

In order to define the time-at-risk for this population there are alternative methods I tried to implement on Atlas. It would be great if you can provide feedback on best practices for the following questions (please keep in mind that this definition will be relevant both for EHR and claims databases):

1) Using visit occurrence vs observation period as cohort entry
I understand that every time a patient has an interaction with the healthcare system they will have a visit occurrence recorded, which would make visit occurrence the most ‘full’ table for a patient. I was wondering whether I would still get more patient counts from the database if I look at observation period than using visit occurrence as cohort entry. Would you recommend observation period or any visit occurrence as the cohort entry event for the denominator cohort?

Implementing the denominator definition with observation period also means that I can only specify the length of observation period as 365 days and not a continuous observation of 365 days prior to index (which is the entry criterion for the numerator cohort). Do you generally implement a minimum observation period in a way that reflects the cohort definition for the numerator?

2) Restricting the cohort entry event to the study time frame
Independent of specifying cohort entry as visit occurrence or observation period, the study start date can be defined on the UI for all cohort definitions. In general, I am hesitant to use observation period start date as the study timeframe as I believe this only specifies patients that have an observation period start on that specific date, although I would be interested in patients whose observation period is ‘active’ on that study start date. Although I suspect that if I use visit occurrence as cohort entry, this would also capture those with an active ‘observation period’ prior to the study start date, my question relates to whether or not setting up a study start date on the cohort definition for incidence denominator actually matters. When using the Incidence Rates tab on Atlas, there is already a function which enables me to define the timeframe during which I would be interested in the IR calculation. What is the best practice for this? Is it recommended to only define the study timeframe in the IR tab directly and does that have an impact on the IR calculation since the study timeframe is defined for the numerator but not the denominator?

3) Adding exclusion criteria to the population at risk definition, in line with the censoring criteria
It is generally not common practice to have exclusion criteria in the denominator cohort, although censoring criteria might be implemented for events other than the qualifying incident event. However, in the case of HMB, it seems appropriate to exclude some patients from the start as these patients are not ‘at risk’ to develop menorrhagia (e.g., hysterectomy, menopause). Do you see that as appropriate to add these exclusions and in that case would you implement such exclusions as ‘exactly 0 occurrences’ in the cohort entry event or at the inclusion criteria stage of the Atlas UI implementation?

It would be great if you can let me know about your opinions and which approach would give me a more robust incidence rate estimate. I would also be happy to discuss in which direction (inflation or deflation of incidence rates) each of these approaches may result in so that I can acknowledge it as a part of my study design.

I also wanted to test the Incidence Rate SQL export on Snowflake UI however I am not really sure what the ‘cohort_table’ variable refers to. @Chris_Knoll would you be able to advise me on how to define this variable?

Thanks a lot for all your help in advance!!

Chris_Knoll · August 22, 2023, 2:14pm

The IR sql you see is exported in a form that if you already have a cohort table constructed, you can use your custom cohort table as the cohort_table place-holder. However, if you want to use the cohort table that Atlas writes to via cohort generation, you can use your CDM’s {results_schema}.cohort table, but make sure you generate the cohorts in Atlas first!

For the initial questions, I’ll make my best guess, and welcome input from others in the community:

This depends what you want to do, if you want to establish an patient’s healthcare interaction as a basis for follow-up time, then use Visits. But, I think if I was in your shoes, I would be using Observation Period. Observation Period (OP) is an oft misunderstood element of the CDM, but the explicit purpose of the OP table is to establish the time frame where a person is at risk of an event. In health claims context, things like enrollment establish the observation period because it’s expected that if something happens, the person files a claim about it. For some EHR systems it’s difficult because there’s no guarantee that a person will go to a specific health care center vs. another, so the solution is to infer an observation period based on their visit activity. But, in both cases, the observation period is the place to go to determine when a person is at risk of some clinical event. Therefore, I think you might be better served using OBSERVATION_PERIOD if you feel confident that your own data source is doing the ‘right thing’ with respect to OP.

You can do that if you like in the cohort definition: if you specify a date range like Jan 1 2015 - Jan 1-2018, it will drop people that did not start during this time (so think of it as left-filter) but anyone that did start in that tine window will exit the cohort at the end of the study window (think of it as right-censor)…So, the dates left-filter, right censor (if that makes sense).

On the other hand, there is a study window parameter in Incidence Rate analysis that performs the same function, just at the time at risk window: if you specify a study window, it will drop anyone that did not start their TAR in the study window, and end their follow up time at the end of the study window (or observation period if that happened first, or if they had the outcome prior to TAR, that will also exclude them.

So, I’d define your cohorts to just be when people are at risk, and use the IR study window option to restrict it to a specific time window.

I was recommending to put the study timeframe in the IR tab, but I was curious about this question: I wouldn’t put any study window constraints on your outcome cohort (numerator). The study window restricts your target population, which is your denominator, and we look within the denominator population for numerator cases. You raise a good point about if you did restrict the study window in your numerator but not your denominator, that would be a bit of a problem: You’d only let cases found within some time window, but your denominator may span time outside that study window, thus making your denominator ‘immortal’ for the period of time that you are not allowing numerator cases to be identified. So, I’ll just reiterate: don’t put study window restrictions in your outcome cohort…and you don’t even need to do it in your denominator cohort definition when you can specify that in the IR analysis. The only reason i can think of that you might want to put it in your Target cohort def is that you also want to use that cohort in things like characterization, where we don’t have the notion of a ‘study window’ for characterization (however, we do have the notion of a subgroup analysis, and you technically could make a subgroup that limits those people that have cohort_start between a study window date range, but I digress).

I wouldn’t say that it’s not a common practice to have exclusion criteria in denominator: maybe you are looking at an indicated population or some other targeted sub population that you have a specific clinical question about. However, I think you’re driving towards the idea of a ‘background rate’ where you want to be as inclusive as possible when identifying your target population, so I think I understand the sentiment that you’d normally not restrict the denominator. But, in some cases where you’re looking for gender-specific outcomes, you may want to restrict to only males or females in the analysis. So, that’s fair game, and yes, you’d put that as an inclusion criteria (it’s a Demographic criteria in the case of gender).

In addition, you may want to identify moments where a person exits the cohort due to some clinical event. You can use ‘cohort censor events’ for this purpose…

To put this all together:

Define your entry events based on OP to define a person’s entry into the cohort
Add inclusion criteria to restrict to age, gender, or some other criteria for your target population
You probably want the person to stay in the cohort for as long as possible (until end of observaton) so you’ll use the default cohort exit of ‘until end of observation period’.
However, you may identify events that would kick a person out of the cohort early. So, put those in cohort censor events.

Bake on 375 for 45 minutes, and let rest for 15. Serve with your choice of ice cream.

-Chris