OHDSI Home | Forums | Wiki | Github

Combining evidence from multiple analyses estimating causal effects

Hi all! I’m interested in solving the following problem, and would like to get everyone’s input:

Suppose we want to estimate the causal effect of an exposure on an outcome. We run different analyses to answer this question, for example using a comparative cohort design where the comparator is assumed to have no effect, a self-controlled case series, with different types of adjustment strategies (PS matching, stratification), etc. We are also probably uncertain about when the exposure may cause the outcome. It could be only within the first 7 days after exposure, or only after a long time. We may try to approximate this by using various time-at-risk (TAR) windows (e.g. 1-7 days after exposure, 1-21 days, etc). If we run all these different methods, using different TARs, we will get many estimates for the same effect. How do we interpret this evidence?

I think it makes sense to separate out the different TARs, since these seem to be asking slightly different questions. So for now, I’d like to focus on different analyses using the same TAR. Should we pick the ‘best’ analysis, and disregard the rest? Or could we combine the estimates and get a more reliable or precise estimator?

A promising approach might be Bayesian Model Averaging, where we average across models, weighted by how well they explain the data. However, it is not obvious to me how to implement this. Packages like BMA assume all models are highly related, simply varying in the choice of predictors. In our analyses the models differ much more, with different approaches to approximating the counterfactual, for example by selecting different subjects (comparative cohort designs), or by selecting different time periods of the same subject (self-controlled designs).

Is anybody already working on this? I’ve created some simulation code here that could be used as a basis.

I am not working on this, but I am interested in the challenge of how to combine models where some use different analyses of the same data and some use the same analyses of different data, and how to account for the varying dependencies.

I think it for model averaging it is not only important that the same TAR is being used, but also that the methods are focused on the same estimand. For example, if one method is being used to estimate the average treatment effect in the treated and another is being used to estimate average treatment effect in the population, then I would question the value of combining them (how would you interpret the result?).

But if they are alternative approaches to address the same causal question then it seems quite sensible to think about averaging (especially in the situation where all methods pass diagnostics and there is no obvious “best” approach).

There are applications of BMA out there that average over all sorts of things:

  • which individual data points are treated as outliers
  • different transformations of response and predictor variables in regressions
  • graphical models
    No particular reason not to average over different causal estimates although there might be some interpretational challenges

Model averaging provides a way to combine different estimates computed on the same data. Meta analytic approaches can then combine those model averaged estimates across databases.

1 Like

I agree with this comment. I am not sure whether this question is about the model itself (that is, the study population and all variables are defined identically and only the statistical model changes) or broader heterogeneity might exist (eg, exposure might be defined somewhat differently in different analyses). If models address the same question, it would make sense to combine their results. One good example is multiple imputation, in which we run the same analysis several times just changing the imputed value, and we then average the results. I always felt that the value of using different models to assess one question is that each model brings some strengths and the value is in having several estimates instead of one. For example, an article looked at the association between use of antibiotics in pregnancy and asthma in childhood (https://doi.org/10.1136/bmj.g6979); the cohort design resulted in a positive association and the sibling design did not. The interpretation was that the sibling design was better at controlling within-family confounding. We would not want to combine the estimates from those two analyses and loose the explanatory value of actually having two results.