Continuing the discussion from [Patient Level Prediction Workgroup] Meetings:
I want to clarify my comment at the end of today’s call.
Abstract model representations might be useful for at least two purposes. Our conversation focused on one: dissemination - ensuring that a model runs in an equivalent way across sites.
Another purpose is to facilitate evidence generation about predictive modeling methods. We intend to compare modeling approaches to see which perform best for which outcomes etc. Beyond declaring one approach a winner, we will be in a better position to explain why it was superior if all the approaches we compare are represented in a way that makes all their potentially important features explicit.
CISNET developed an approach for this among groups working on simulation models for cancer prevention strategies. There the surfacing of assumptions and modeling differences helped reduce variation in model predictions and clarify why differences were occurring. Though the approaches we’ll compare are different from the simulation models produced in CISNET, they will be complex enough that it could be helpful to have a clear mapping of the ways that approaches differ from one another.
I don’t know whether these two purposes can or should be achieved with a single model representation. You don’t need to know everything about how a model does what it does to get it to run the same way across sites. You just have to have a reliable way of pointing to standard routines that implement the intended method. So the explicitness needed for illuminating method performance differences would be unnecessarily burdensome when the purpose is just reliable dissemination. So maybe the second representation, if useful at all, is more of a study design consideration than an OMOP standard that should be imposed on all patient level predictive models. I’m not sure.
Hi Andrew,
Missed this message earlier sorry for the late reply.
I think you raised a valid point. Regarding the dissemination of the models we could think of two different use cases. 1. share within OHDSI community for running a model built on one database and applied on another. 2. disseminating the final model outside of the community to be applicable in daily practice like Hamed showed in his presentation. Probably for use case 1 the approach were we share the model as an R object will work fine since we can also share the exact same cohort definition and data extraction components. This is less applicable for option 2 since there might not even be a OMOP CDM in place. For me the priority is use case 1 but I think we need to keep use case 2 in mind as well.
I would still be interested in seeing if PMML can help in anyway in our process to have a common language for the model sharing even though the more pragmatic way of sharing the R object of the model etc is of course easier to implement to do the job we want to achieve now.
Thanks and talk to you soon,
Peter