Team:
@schuemie’s approach to empirical calibration of p-values based on estimation of the error distribution is a major step forward to the field and has already proven pivotal in helping us understand some the reliability of observational study estimates.
At the OHDSI collaborator face-to-face, we discussed the need to develop an analagous calibration approach for confidence intervals. @schuemie, @David_Madigan, @msuchard and I had developed a feeble approach presented a few years ago at the OMOP Symposium…the math was solid, but the fundamental challenge was requiring the need to know the true effect size for positive controls. During a breakout, we proposed using negative controls in real data and then injecting signals into the real data as a means to have a true effect size…this is definitely an encouraging way forward to explore as a research direction.
The biggest issue we had identified with this approach was the difficulty in injecting signals in a way that would follow the same basic confounding/covariate structure as the ‘real’ outcomes.
Last night, I had a thought about how to overcome this issue, so I thought I’d throw it on the forum for discussion:
what if we use CYCLOPS to fit a predictive model to estimate, amongst the exposed patients, which patients are likely to have the event. Then, from that model, we’d have a predicted probability for each exposed patient. Moreover, we’d know the distribution of predicted probabilities for those patients who were both exposed and had the outcome. Using this, we can assign ‘injected outcomes’ by sampling probability values from the probability distribution of the exposed cases, and then finding a ‘probability-matched’ exposed control, and injecting a outcome record during their period of exposure.
if I’m thinking about this correctly (and I may not be), our current framework with CYCLOPS within the CohortMethod package that’s under development is using a large-scale covariate framework. We’d like to fit this predictive model anyway in the context of learning the relationship between the exposure and outcome. It seems this could serve as another use case for the predictive model, one that would enable auto-calibration. Essentially, we could create different outcome cohorts, essentially one per negative controls at each level of injected signal size. We use the assumption that the negative control has true RR=1, then to inject a RR=2, we’d take the rate of events during the negative control time-at-risk and double it, inject outcomes for the requisite matched set. then repeat the procedure for other fixed values: RR=1, 1.5, 2, 4, 10…(we still need research to figure out what is the correct quantity to do the linear fitting of the error distribution to calibrate the confidence interval, but prior work with these 5 values worked pretty well).
Does this make any sense or am I off my rocker? If the idea has got legs, maybe we can figure out who’d like to work on a trial experiment. Probably a good topic for @schuemie and @msuchard’s Methods workgroup, whenever they next meet.
Cheers,
Patrick