Concern about the construction of the positive controls in the empirical CI calibration paper

In population-level estimation, systematic error can occur for roughly two reasons:

  1. Differences between the exposed and non-exposed (or target and comparator) that are not due to the exposure. For example, the people that get the exposure might already be sicker than those that do not, and for that reason alone have the outcome more often. These differences can also include differences in the likelihood that a true outcome is recorded in the data (detection bias).

  2. Non-differential error in the detection of the outcome. For example, if the positive predictive value is 50% everywhere, then half of the outcomes we consider in our computation are not really the outcome, and we may be biased towards the null.

Negative controls are useful because they can help detect the first type of error, including detection bias. For example, if in the exposed group the sensitivity is 100%, but in the unexposed group it is 50%, the negative controls will show a relative risk of 2 even if there’s no true effect.

Negative controls do not inform us on what happens when the null is not true, when the true relative risk is greater or smaller than 1. This is why we introduced synthetic positive controls, where this first type of error is preserved as much as possible. We can then evaluate a method on these positive controls, and for example see whether there is immortal time bias, shrinkage towards the null, etc.

@rosa.gini’s confusion seems to stem from the fact that positive controls do not address the second type of error. To quantify this type of error, we desperately need the work headed by others as discussed here, because neither negative controls nor synthetic positive controls help here.

However, her math is misleading. The positive control synthesis does not require the assumption that sensitivity is 100%, just that it is the same for real and synthetic outcomes. I will try to explain in this example:

Imagine we know everything, that there are 10 true outcomes in the exposed group, 10 true outcomes in the unexposed group, and that the sensitivity is 80%, so we observe only 8 outcomes in each group. If we want to double the risk in the exposed, we add 10 more ‘true’ outcomes, and because the sensitivity is 80% we only observe 8 additional outcomes, so 16 in total in the exposed group. The true RR is 20/10 = 2. The observed RR is 16/8 = 2.

Now imagine we don’t know everything. All we see is 8 outcomes in each group. As our method prescribes, we double the number of observed outcomes in the exposed group, so we add 8 to get 16. The observed RR is 16/8 = 2. We don’t know the true RR, but as explained above, if we did know everything we’d know it is 2.

For those who care about the nitty gritty details: Rosa’s math assumes it matters that some of the people we inject outcomes in may already have the outcome in real life, even though we didn’t detect it because our sensitivity is not 100%. In that sense we are not really adding outcomes, and are not increasing the true RR as much as we think. The probability of this happening is small: most outcomes we study have a prevalance of less than 1%, and so the effect, if it really was a problem would be less than 1%. But it is not a problem at all; we are trying to estimate the first type of error, and the collission of hypothetical true but unobserved events is not relevant to this problem. As a thought experiment, if we simply allow for people to have two outcomes, one unobserved (due to the background rate) and one observed (synthetic, simulated to be due to the exposure), the math works out, and Rosa’s argument falls apart. Finally, as is tradition in OHDSI, empirical evidence should have the last word: the fact that on many occassions we have found study designs to be virtually unbiased both for negative and synthetic positive controls (e.g. the Graham replication in our paper) shows that at least in those studies this problem did not exist.