Hi all! I’m interested in solving the following problem, and would like to get everyone’s input:

Suppose we want to estimate the causal effect of an exposure on an outcome. We run different analyses to answer this question, for example using a comparative cohort design where the comparator is assumed to have no effect, a self-controlled case series, with different types of adjustment strategies (PS matching, stratification), etc. We are also probably uncertain about when the exposure may cause the outcome. It could be only within the first 7 days after exposure, or only after a long time. We may try to approximate this by using various time-at-risk (TAR) windows (e.g. 1-7 days after exposure, 1-21 days, etc). If we run all these different methods, using different TARs, we will get many estimates for the same effect. How do we interpret this evidence?

I think it makes sense to separate out the different TARs, since these seem to be asking slightly different questions. So for now, I’d like to focus on **different analyses using the same TAR**. Should we pick the ‘best’ analysis, and disregard the rest? Or could we combine the estimates and get a more reliable or precise estimator?

A promising approach might be Bayesian Model Averaging, where we average across models, weighted by how well they explain the data. However, it is not obvious to me how to implement this. Packages like BMA assume all models are highly related, simply varying in the choice of predictors. In our analyses the models differ much more, with different approaches to approximating the counterfactual, for example by selecting different subjects (comparative cohort designs), or by selecting different time periods of the same subject (self-controlled designs).

Is anybody already working on this? I’ve created some simulation code here that could be used as a basis.