We’d appreciate your thoughts –
Recently, there has been an increased interest, particularly in the healthcare domain, in distributed methods that aim to improve external performance of predictive models (or risk scores) using limited information shared by (a subset of) the external resources; some examples include Duan et al and Luo et al.
Taking a step back and focusing on the OHDSI network, we wonder which characteristics such methods need to fulfill to make them practically applicable?
Here’s our take:
- Information shared by external resources should be human readable and interpretable (e.g., feature prevalence or distribution in subpopulations) to ensure privacy, transparency, and make regulatory approval easier
- Methods should support training by various ML algorithms
- Single (or few) communication rounds
Do these make sense? Are they really necessary? Anything else?
Thanks.
@jennareps @schuemie @Patrick_Ryan @hripcsa @msuchard @ssaria