@jswerdel this is spectacular, thank you for starting this discussion on Lupus.
First, I want to reinforce a important methodological lesson you highlighted nicely here: when we create a cohort definition that is intended to identify the set of persons who satisfy one or more criteria for a duration of time, we need to be concerned with various dimensions of measurement error. First, is measurement error about disease classification, which requires a complete understanding of sensitivity (which patients actually have the disease that we missed?) and specificity (what patients without the disease are correctly classified as such?). We will lots of papers perform some sort of chart record verification to estimate positive predictive value (which patients identified are actually with disease?) but it’s hard to know what to do with PPV alone. This is why I’m so excited by Joel’s innovation with PheValuator, because it provides an estimate of all the relevant operating characteristics, which makes it possible to formally integrate measurement error into our analyses.
The other type of measurement error that I see much less discussed in the literature (post references if it has been discussed and I’ve missed it) is index date misclassification (did the person enter the cohort on the right date?). And here, you’ve demonstrated both how we can DETECT index date misclassification using CohortDiagnostics (by looking at prior conditions or treatments that are likely indicators of condition start preceding diagnosis) and how to CORRECT for index date misclassification using ATLAS (by creating an entry event for any of the potential disease markers, and then imposing an inclusion criteria requiring the diagnosis code on or in some time after the entry event). It’s interesting to me to think how much an index date misclassification correction impacts the cohort, both in terms of how many patients actually saw their cohort start date shift, and also by the duration of those shifts. Depending on the context for use of the phenotype, the impact of the error can vary, but it seems to me to be a bigger problem than generally appreciated (given that most phenotypes I read in the literature don’t discuss making this kind of correction).
Second, I’d really like to hear from the community about @jswerdel 's assertion of having many different alternative definitions depending on the use context. I agree that after we have empirical operating characteristics, we can make choices about the measurement errors that we’re willing to consider. But I wonder about the competing tradeoff that comes with consistency of having a definition that is applied and understood vs. variance that is introduced by changing phenotype at the same time of changing the research question.