OHDSI Home | Forums | Wiki | Github

Calibration confidence intervals, idea for injecting signals

Team:

@schuemie’s approach to empirical calibration of p-values based on estimation of the error distribution is a major step forward to the field and has already proven pivotal in helping us understand some the reliability of observational study estimates.

At the OHDSI collaborator face-to-face, we discussed the need to develop an analagous calibration approach for confidence intervals. @schuemie, @David_Madigan, @msuchard and I had developed a feeble approach presented a few years ago at the OMOP Symposium…the math was solid, but the fundamental challenge was requiring the need to know the true effect size for positive controls. During a breakout, we proposed using negative controls in real data and then injecting signals into the real data as a means to have a true effect size…this is definitely an encouraging way forward to explore as a research direction.

The biggest issue we had identified with this approach was the difficulty in injecting signals in a way that would follow the same basic confounding/covariate structure as the ‘real’ outcomes.

Last night, I had a thought about how to overcome this issue, so I thought I’d throw it on the forum for discussion:

what if we use CYCLOPS to fit a predictive model to estimate, amongst the exposed patients, which patients are likely to have the event. Then, from that model, we’d have a predicted probability for each exposed patient. Moreover, we’d know the distribution of predicted probabilities for those patients who were both exposed and had the outcome. Using this, we can assign ‘injected outcomes’ by sampling probability values from the probability distribution of the exposed cases, and then finding a ‘probability-matched’ exposed control, and injecting a outcome record during their period of exposure.

if I’m thinking about this correctly (and I may not be), our current framework with CYCLOPS within the CohortMethod package that’s under development is using a large-scale covariate framework. We’d like to fit this predictive model anyway in the context of learning the relationship between the exposure and outcome. It seems this could serve as another use case for the predictive model, one that would enable auto-calibration. Essentially, we could create different outcome cohorts, essentially one per negative controls at each level of injected signal size. We use the assumption that the negative control has true RR=1, then to inject a RR=2, we’d take the rate of events during the negative control time-at-risk and double it, inject outcomes for the requisite matched set. then repeat the procedure for other fixed values: RR=1, 1.5, 2, 4, 10…(we still need research to figure out what is the correct quantity to do the linear fitting of the error distribution to calibrate the confidence interval, but prior work with these 5 values worked pretty well).

Does this make any sense or am I off my rocker? If the idea has got legs, maybe we can figure out who’d like to work on a trial experiment. Probably a good topic for @schuemie and @msuchard’s Methods workgroup, whenever they next meet.

Cheers,

Patrick

Very interesting! What’s the rationale for matching? Why not just take a random sample of exposed controls and give them the event with probability specified by the model?

Its not obvious to me how to inject an RR of say 2. For a negative control all the exposed events are due to random chance, confounding, measurement error, etc. Indeed, because of these things the estimated RR may already be 2! Is it clear that simulating from a doubled rate is the right thing to do?

At the end of the day there still has to be a leap of faith that the kinds of things that corrupt positive control estimates are similar to the things that corrupt negative control estimates.

Definitely want folks to challenge if my logic doesn’t make sense here, but
this is my thinking:

  1. Why matching vs. random sampling weighted by predicted probability? A
    fair question, my only thinking was we want to inject outcomes that are
    similar to those that are already in the data within the exposed
    population. To your point, since those events are not causal, maybe that’s
    the wrong sampling frame, and perhaps just using the predicted probability
    itself would be cleaner. Neither approach would ensure true confounded
    relationships (with the confounder being associated both the exposure and
    to outcome), but at least it would present a distribution of covariates
    associated with outcome, and by selecting from the exposed population, you
    know what covariates exist in exposed patients (though not necessarily
    differentially from unexposed).

  2. How to inject RR =2. So, as you say, for a negative control, all
    exposed events that are currently in the data are assumed to be not causal.
    The assumed true RR = 1. That says nothing about the estimated RR that a
    method would produce, because that depends totally on the method’s use of
    the data. In particular, it depends on how the method produces its
    relative estimate, which is largely defined by what ‘comparator’ its using
    to infer a counter-factual. A cohort method might user a different active
    drug, a self-controlled cohort design might use a defined time window
    prior to exposure, a self-controlled case series might use all unexposed
    time in the patients history…but all of these are simply instruments to
    try to ascertain a ‘expected’ rate which can be compared with the
    ‘observed’ rate found during exposure. Under the premise that a negative
    control has a true RR = 1, then that means the ‘true’ observed rate =
    expected rate (but does not provide insights into whether a method’s
    estimate of the expected rate is biased). If the ‘true’ observed rate
    (events/time-at-risk) = ‘true’ expected rate, then it should hold that
    injecting a signal of RR=2 amounts to multiplying the ‘true’ observed rate
    by the RR, which can be achieved by randomly adding (RR-1)*events to the
    patients with time-at-risk. Then, the new ‘true’ observed rate = [
    (original events + injected events)/time-at-risk.

If we want to inject RR=1.5, 2, 4, 10, etc., I think it’d be valuable to
inject a different random sample of ‘injected events’ at each RR
threshold. So, for example, the set of injected events in RR=2 wouldn’t
necessarily be subsumed in RR=4.

Another nice value of this approach: in addition to estimating a
calibrated confidence interval, having the effect estimates at the assumed
null and all injected signal levels would then allow estimation of
predictive accuracy (using AUC) by putting all effect estimates at all
thresholds in a rank-ordered list. We’d also be able to get an estimate of
coverage probability and mean squared error at values beside RR=1.

  1. All of our work in empirical calibration rests on an exchangeability
    assumption. The challenge we face is this assumption is inherently
    untestable, because it requires knowing the true covariate causal structure
    for both the unknown estimator and for all the negative controls…and if
    we knew that, we wouldn’t be producing estimates of the causal effect in
    the first place! Since the exposure-outcome relationship we are interested
    in is unknown, the best we can do is try to ascertain the reliability of
    the methods using other exposure-outcome relationships which may be drawn
    from an adequately similar collection of potential covariate structures.
    It seems to be the use of negative control exposures (hold the outcome
    constant, as we did in prior OMOP experiments) or negative control outcomes
    (holding the exposure constant) ensures that you have a least half of the
    equation the same, but it certainly doesn’t guarentee exchangability.

A crazy idea (maybe), what if we used the predictive model for the unknown
outcome of interest as a means of injecting signals for the negative
control outcomes? That way, you’ve held the exposure constant and you’ve
assigned ‘causal effects’ based on the probability of having the outcome.
So, as a concrete example, if I’m MiniSentinel studying dabigatran vs.
warfarin for AMI, I fit a model for p(AMI) to figure out what predicts AMI
amongst patients exposed to dabigatran. Then, I have a collection of
negative control outcomes - conditions known not to be associated with
anticoagulant use - conditions like anxiety, insomnia, tinnitus, gout, foot
fracture, influenza, etc. To inject events for these negative control
outcomes, I use the probabilities that we got from the p(AMI) model to
sample patients. It’s a bit goofy, but it does seem to give us the same
covariates across exposure/outcomes to help support our exchangeability
assumption. Is this something worth trying? Anyone want to take a crack
at it?

Patrick:

Is the following understanding of your idea correct: Instead of explicitly knowing the confounding structure in positive drug-outcome cases, we use the patient-level prediction to simulate it, assuming that the predictor, somehow, incorporated all of that confounding structure into its model?

If that’s the case, I still don’t understand how you would tweak the predictor in such a way that it would “use” a pre-defined RR different from the true RR. Because it will just have the RR+confounding “built in” that it learned from the data, and it won’t tell you what it is, or how both components relate to each other.

But I think this is going into a good direction, though.

C

What I’m proposing does not involve any positive
controls. After all our work, I’ve become deeply skeptical that we can
ever find any ‘known’ drug-outcome associations with a ‘known’ effect size,
and certainly am uncomfortable with any approach that requires construction
of an adequate sample of such effects. At best, I believe LAERTES may
give us evidence to define binary classification of effects, but we require
precision in the true effect size if we’re going to try to calibrate point
estimates and associated confidence intervals.

Here’s my latest proposal:

  1. Define the exposure-outcome relationship of interest to study
    (RRunknown). The time-at-risk is required as part of this definition
    (e.g. first event O which occurs within 90d after first exposure E)
  2. Identify a set of negative control outcomes {NCi,n} (conditions known
    not to be associated with the exposure of interest E, such that we can
    comfortably assume that true RR = 1 for all E-NCi relationships)
  3. Fit a predictive model for p(O|E) - what is probability of having
    outcome O during the time-at-risk given that patients are exposed to E.
  4. Apply the model to all patients to assign a predictive probability of having O for every patient exposed to E
  5. For each NCi,
    5.a. Calculate the background rate of event NCi (# of persons with NCi during time-at-risk / total time-at-risk)
    5.b. Create cohort of persons with outcome NCi
    5.c. For each injected signal threshold, RRk=1.25, 1.5, 2, 4, 10:
    5.c.i. Calculate the number of injected events required, IE = # of persons with NCi during time-at-risk * (RRk-1)
    5.c.ii. Randomly sample IE using the p(O|E)
    5.c.iii. Create a new outcome cohort for NCi at threshold RRk
  6. Apply method(s) to:
    6.a. Unknown effect E->O
    6.b. Null effects E->NCi
    6.c. Injected effects E->NCi@RRk
  7. Estimate operating characteristics (predictive accuracy, coverage probability, mean square error, error distribution) using estimates from null effects and injected effects
  8. Perform empirical calibration on p-values for unknown effect by using the estimates from the null effects
  9. Perform empirical calibration on confidence intervals for unknown effect using estimates from null effects and injected effects

This proposal attempts to address the potential concern of the exchangeability assumption because 1) the exposure is identical, and 2) the outcomes are injected following the same covariate distribution as the unknown effect. That means the confounding structure for the unknown effect should be approximately represented in the injected signals (this does not impact the empirical null distribution estimated by the negative controls). It still requires the assumption that the negative control outcomes are similarly observeable to the unknown effect, and it’s probably advantageous for the negative controls to cover a range of prevalence of outcomes that subsumes the expected rate of the unknown event.

Great idea! (Ehhh, we actually already discussed injecting events on top of negative controls based on a fitted outcome model in New York, but blame me for not writing the minutes :wink: )

However, using the outcome model from the drug-outcome pair of interest to inject signals for a negative control does not make much sense to me. The background events of the negative control come from one model (e.g. if the negative control is MI, then a risk factor is DM2), and you’re then mixing that with a different outcome model for the inserted outcomes (e.g. if your outcome of interest is UGIB, then DM2 is not a risk factor (just making this up)). The result is an average between the two outcome models.

Also note that there is a bit of self-fulfilling prophecy here: If we do use the exact SQL code from the CohortMethod package, then I predict that the CohortMethod will be very good at this task, since the model it uses is the same model used to generate the data. But that is only true for the injected outcomes, so maybe fine (and certainly better than unweighted insertion of events).

Building a method evaluation package including this signal injection stuff is on my to-do list, but I’m not sure when I get around to it. If anyone else wants to try that would be great.

t