Hi @Christian_Reich,
Thank you for the suggestions and feedback… especially for providing it free of self-promoting sentences.
With respect to Modality, we were alluding to differentiating between rule-based (heuristic) phenotypes versus computable (algorithmic) phenotypes.
Provenance is intended to capture how phenotype definitions evolve over time. A simple example is versioning. Supposing we have a phenotype with versions 1, 2, and 3, we would have the provenance capture that version 3 came from version 2, and version 2 came from version 1. This doesn’t have to be one-to-one though. By having each phenotype identify its ancestor(s), we could navigate a graph to show how a given phenotype developed.
This has implications when it comes to validation. With each change in version comes a potential change in the algorithm’s performance. Accordingly, we’ve decided to anchor the validation sets to whatever version the validation refers to. The validation sets do not “carry forward” to protect the user from automatically assuming, for instance, that if version 2 performed well, then version 3 must perform the same.
Now, it could be that that’s true if the change in version was minor, but what constitutes such a “minor change” is difficult if not impossible to establish in a general framework and would likely have to be considered on a case by case basis. However, all of the information would be available to the user to do so.