OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 10 - Systemic Lupus Erythematosus

In this edition to Phenotype Phebruary, I’d like to discuss the work @Jill_Hardin and I did for developing phenotype algorithms in the immunology space for systemic lupus erythematosus.

Systemic lupus erythematosus (SLE) is a chronic autoimmune disease of unknown origin. Clinical manifestations include fatigue, arthropathy, and involvement of nearly all organ systems, particularly cardiac and renal.(Jump et al, Greco et al, Miner et al, Danila et al) A review by Stojan and Petri of research on multi-country incidence rate estimates found the incidence rate of SLE to be between 1-9 cases per 100,000 person-years (PY).

As @Patrick_Ryan has provided an excellent review of the details of the phenotype algorithm development process, I’ll build on that to demonstrate how we used the process for our cohort definitions. We first conducted a literature search for phenotype algorithms for SLE. From those resources we determined the codes used in prior studies. We used those as a starting point and entered those into the wonderful PHOEBE tool developed by @aostropolets . The final concept set was:

We then began building our cohort definitions. We were concerned about possible index date misclassification as prior research had indicated that there may be a long period between first symptoms and first diagnosis. We used the spectacular Cohort Diagnostics tool (thank you @Gowtham_Rao!) to examine the conditions and drugs in the time prior to an initial diagnosis of SLE and found in the IBM Commercial Claims and Encounters dataset:

We sorted through some of the big hitters, proportion-wise, across the different data sets and picked out ones that were likely candidates for SLE signs and symptoms and came up with:

We did the same for drug exposures and developed:

We developed cohort definitions for index date correction using 90 days as a possible time prior to the first diagnosis. In this definition, subjects would be included in the cohort if they had any of the signs, symptoms, or treatments as long as it was followed by a diagnosis code for SLE within 0-90 days after the sign, symptom, or treatment. Using this approach, the corrected index date is the date of occurrence of the sign, symptom, or treatment or an SLE code, whichever came first. Using this definition, we saw a reduction in the possible signs and symptoms in the 30 days prior to this new index date:

[NOTE: here is the link to the the Cohort Diagnostics shiny] We picked 90 days as a reasonable new index point though symptoms may occur for a longer time prior to actual diagnosis.

In our first set of definitions, we ran PheValuator and we noticed a big drop in sensitivity when we did not include a 365 day washout period prior to index in the analysis:

This begs the question of “what is the sensitivity estimate we want to use?”. In previous posts in the Phenotype Phebruary series, I’ve shown the results from PheValuator analyses where I did include the 365 day washout period and found high sensitivities. But I pose the question to the group if this is the way we want to look at sensitivity? To me, sensitivity measures how many subjects you found compared to how many you missed. If you impose a prior observation period of 365 days, you are surely missing subjects as the data above shows. If we impose a washout period in the analysis, that measure of sensitivity should contain the caveat that it is the sensitivity of the cohort definition of those with at least that length washout period prior to index. As they used to say on one of the TV talk shows – Discuss!

We next wanted to see if we needed to improve the specificity of the algorithm. For a single condition code cohort definition, we saw in Cohort Diagnostics:

That seemed like a large drop-off in the proportion of those with an SLE code in the time after index, so we developed a more specific definition requiring a second diagnosis code 31-365 days after index and found:

Which showed a nice increase in the in the proportion of those with an SLE code in the time after index. The measure we looked at was in the 1-30 day post index as the 31-365 day measure included required codes in cohort C2.

We ran PheValuator a second time and found:

We see the large drop in sensitivity when we compare the incident cohorts analyzed with 0 days prior washout to the ones with 365 day prior washout (gray shaded lines). We also see a large increase in positive predictive value (PPV) when we compared a single code algorithm to an algorithm requiring a second code. This, though, came at a cost in sensitivity.

In our final analysis, we concluded that we should make four algorithms available for use with different “fit for function” purposes. If your research requires a high sensitivity algorithm, possibly looking at SLE as a safety outcome where you don’t want to miss cases, then you might consider cohort definition “Systemic lupus erythematosus prevalent and correction for index date”. If you want to ensure that you have prevalent cases of SLE with a high probability of truly having the condition, for instance in drug comparison studies where you want only those subjects with SLE prior to drug exposure, you may want to use “Systemic lupus erythematosus prevalent with 2nd dx and correction for index date”. If your research requires incident cases of SLE, then you may want to use either “Systemic lupus erythematosus incident and correction for index date” if you want to maximize sensitivity or “Systemic lupus erythematosus incident with 2nd dx and correction for index date” if maximizing specificity is your requirement.


@jswerdel this is spectacular, thank you for starting this discussion on Lupus.

First, I want to reinforce a important methodological lesson you highlighted nicely here: when we create a cohort definition that is intended to identify the set of persons who satisfy one or more criteria for a duration of time, we need to be concerned with various dimensions of measurement error. First, is measurement error about disease classification, which requires a complete understanding of sensitivity (which patients actually have the disease that we missed?) and specificity (what patients without the disease are correctly classified as such?). We will lots of papers perform some sort of chart record verification to estimate positive predictive value (which patients identified are actually with disease?) but it’s hard to know what to do with PPV alone. This is why I’m so excited by Joel’s innovation with PheValuator, because it provides an estimate of all the relevant operating characteristics, which makes it possible to formally integrate measurement error into our analyses.

The other type of measurement error that I see much less discussed in the literature (post references if it has been discussed and I’ve missed it) is index date misclassification (did the person enter the cohort on the right date?). And here, you’ve demonstrated both how we can DETECT index date misclassification using CohortDiagnostics (by looking at prior conditions or treatments that are likely indicators of condition start preceding diagnosis) and how to CORRECT for index date misclassification using ATLAS (by creating an entry event for any of the potential disease markers, and then imposing an inclusion criteria requiring the diagnosis code on or in some time after the entry event). It’s interesting to me to think how much an index date misclassification correction impacts the cohort, both in terms of how many patients actually saw their cohort start date shift, and also by the duration of those shifts. Depending on the context for use of the phenotype, the impact of the error can vary, but it seems to me to be a bigger problem than generally appreciated (given that most phenotypes I read in the literature don’t discuss making this kind of correction).

Second, I’d really like to hear from the community about @jswerdel 's assertion of having many different alternative definitions depending on the use context. I agree that after we have empirical operating characteristics, we can make choices about the measurement errors that we’re willing to consider. But I wonder about the competing tradeoff that comes with consistency of having a definition that is applied and understood vs. variance that is introduced by changing phenotype at the same time of changing the research question.

I think sometimes you need a different definition for different questions. If the condition is my outcome and I really need to know it’s incident this is a different phenotype from perhaps I want to exclude you and I don’t need to know the exact date of the onset of your condition. I think we are challenged in defining the index date in many of our conditions of interest. Google: george costanza lupus for the meme. I think we will need to avail ourselves with multiple cohort definitions based on observational need.


Hi @jswerdel! So my colleague @CarolynB and I have a few questions on the design (keeping in mind we’re definitively not SLE experts :slight_smile: )

  1. Can you elaborate on how you chose the 90 day window for looking ahead of the potential SLE symptoms/signs/treatments?
  2. We noticed that the SLE concept sets include drug-induced SLE. Drug-induced SLE seems to be different than chronic SLE, the provenance being drug induced and then the duration being much more temporary, though a lot of overlap of symptoms. It’s not a large proportion of the SLE cohort, but just thinking it might be good to filter these cases out; or simply have a more specific SLE set of cohorts for non-drug-induced.
  3. Are there labs to consider here as potential entry events? Like the ANA test, for instance. I know labs can be tricky, but perhaps sensitivity can be increased in EHR sources.


1 Like

Hi @Ajit_Londhe and @CarolynB - Thanks for the thought-provoking questions. That is the great thing about Phenotype Phebruary - it allows many more eyes on the problem to find things that could be improved. During development we debated a number of the points you brought up. We settled on 90 days to account for possible prior symptoms/treatment as a conservative estimate. We used cohort diagnostics and saw a significant drop in the proportion of subjects who had these events in the 30 days prior to index after using 90 days. We probably could have gone longer but were concerned about adding more index date misclassification. Some of the most prevalent signs and symptoms are very common so we wanted to have a middle ground where the signs/symptoms/treatment were within a close time-frame to the actual diagnosis. That being said, SLE can be misdiagnosed for a long period of time so extending the window to, say, 365 days, would not be unreasonable.

While drug-induced SLE does appear to be very different from typical SLE, we wanted to develop an algorithm for all SLE. An offshoot of this could be one that excludes drug-induced SLE. In this case, the low prevalence of drug-induced SLE (< 1%) along with possible misclassification of drug-induced SLE with non-drug-induced SLE makes these exclusion decisions tricky.

Great point about the labs. This is definitely something to consider. There does seem to be a high proportion of ANA in the time prior to index. These should likely be included in the next version. Thanks for bringing that up!