Moving a discussion with @aschuler from e-mail to the forums. The main discussion is on whether positive controls should be synthesized by adding outcomes on top of negative controls (only during exposure to the drug of interest), or by simulating all outcomes.
Alejandro argued:
For one, even the “true negatives” are not perfectly “true”. If that’s the case, the “true” counterfactuals (i.e. the observed outcomes, duplicated) are not the true counterfactuals. However, the “truer” the negative, the truer the counterfactuals and the closer you get to the desired effect. If you can make a strong argument for your true negatives, or at least some of them, then this shouldn’t be a huge issue, although it’s important to acknowledge it.
The second problem is that you are using negative data (data for exposure-outcome pairs where there is no effect) as a stand-in for positive data. There is no rule that the confounding structure (observed or not) that is present in negative data is the same as that which is present in positive data, so using these signal-injected datasets to find methods that will work for positive associations is not foolproof. This is also probably not a killer downside- in aggregate I think that there are enough negative datasets that share confounding structure with positive datasets that it should all even out.
The final issue is related to the problem with my method, which is that it cannot perfectly capture the structure of the unobserved confounding. Although you are only changing some of the outcomes, doing that is still disrupting the unobserved confounding structure, just like my method is doing. The bigger the desired effect size, the more outcomes you will change and the more the unobserved confounding structure will be disrupted. My method has the exact same problem, which we discussed yesterday, but I think both methods address it: yours by leaving as many outcomes untouched as is possible, and mine by applying the minimum possible perturbation to the outcomes in aggregate such as to produce the desired effect size. It’s really the same idea: the less you move the data, the less you perturb the unobserved confounding. With binary outcomes, the two methods should largely produce the same result! And with either approach, artificially removing features to simulate unobserved confounding would do even more to convince us that the evaluations are legitimate- a critic would have to argue that there is some unobserved variable that has a confounding relationship that is totally different than the confounding relationships present in any of the observed variables in order to invalidate this approach (and that’s assuming the total annihilation of the unobserved confounding structure, which we know neither method does).
The first two (ideally minor) issues do not apply to my method, and I think I’ve adequately laid out the case that both methods are equally susceptible to the third issue, and that both do their best to address it. That leaves my method at 2.75/3 issues addressed, and the signal injection at 0.75/3?
And later:
Ah, and I also just realized that even if the true negative datasets really do have 0 average treatment effect, that doesn’t mean there isn’t heterogeneity present! If there is heterogeneity (even minor), then again you do not really know the patient-level counterfactuals and thus the signal injection will not give you the treatment effect you think you’re getting.