Prevalence by index year vs incidence

Christian_Reich · December 6, 2021, 6:36am

You are trying to calculate a period prevalence, e.g. the total time when people suffered from the disease over the at risk population in that period (one year). You need to figure out the numerator and the denominator from the data. Let’s dissect:

Numerator. You need to know all cases (patients falling ill with the disease) and the duration after which the condition has been resolved. Both depend on what kind of condition you are talking about:

(i) acute onsets (e.g. a trauma),
(ii) diseases that come and go (the flu),
(iii) chronic conditions that stay with you once you have them (diabetes).

For the (i) cases, prevalence makes no sense, because those are state changes without a defined duration, and incidence rates is all you need. For (iii) it is trivial, because it’s just the number of patients with the disease (earliest event any time before the end of your index year) divided by the size of the population. So, the only tricky situation are type (ii) diseases. The nomenclature is often very ambiguous. For example the act of breaking a leg is a type (i) onset, but having a broken leg that takes 4 weeks in the cast to heal is a type (ii). “Asthma” is used synonymously for the type (i) attack (patient can’t breathe and gets rushed to the ER), vs a status asthmaticus (an ongoing inability to breathe for days), which would be considered a type (ii) case, vs the type (iii) susceptibility to asthma attacks in general.

For type (ii) diseases, to calculate the prevalence, you would need to sum up the duration of each case in your index year, and divide that by the size of your population times the time at risk (Observation Period in that year). Claims data don’t make that easy, because they don’t say when the disease starts or ends. Instead, each time there is money to be claimed you get a record. But such a record will not tell you if it is a brand new case or still belongs to the previous record (disease still ongoing). Conversely, the absence of a record does not necessarily mean the disease is no longer present. You therefore have to do some work:

The simplest solution is you consider each record as a new case (index criterion for your cohort) and assume a standard duration. This only works if the records are rare and the duration can be reasonably assumed. If that’s not the case, you need to do one of the following solutions.
You create some heuristic for the cohort start date (e.g. diagnosis in timely context with some diagnostic procedure), or end date (e.g. curative procedure).
You start at the first record and make the assumption that after a certain amount of time without any record the disease is over. Atlas lets you chain records till you get a large enough gap.

Denominator. If the population is defined by you (e.g. if you want to calculate the prevalence of complications in your diabetes patients) it is straightforward. If you want to just use all patients in the database you can use the Observation Period as a cohort definition. But if you need the true prevalence in the general population you need to “project” or extrapolate your cases. Atlas does not have that functionality today. In order to do that, you need some mechanism to estimate your sampling rate. Often times, that is done by using the providers, whose total number is known at the national level. A better estimate can be achieved by stratifying the extrapolations by provider specialty, which can also be obtained independently.

You have to be careful with that, because prevalent cases may start in the year before, or because of other artifacts. The first time criterion does not make sense for prevalence calculations. They are used for disease first occurrence incidence rates.

You’d only worry about that in the denominator for type (iii) cases, like diabetes, where patients freshly added to the database have not had enough Observation Period in the past for the chronic disease to be captured. As a result, the patient would show up as a false negative in the denominator, rather than in the numerator. I would not worry too much about that misclassification. Chronic diseases bother the patients and tend to lead to records.