@schuemie, this is really remarkable. Tremendous work! You are amazing.
1) I actually like the dashboard. it wasn't immediately obvious to me
that all methods were in the table (because it was paginated) and it was
only after I re-sorted on AUC that I found the surprise that
self-controlled cohort method is still achieving strong performance. (I do
see I could have clicked on which methods to show/hide, but that was out of
sight without scrolling down on my screen, so I missed it initially).
2) Could we make the table contain ALL data, so that it would be possible
to show, for example, a calibrated vs. uncalibrated performance next to
each other (if thats what the user wants to see).
3) I support the notion that when we typically generate calibration plots,
we don't want to promote the behavior of people exploring the test cases
that achieve statistical significance, and we definitely don't want posthoc
rationalization governing choices of negative controls in studies of
unknown effects. I'm wrestling in this case how we want to expose the test
cases....at the mininum, I would think we need to expose the list of
drug-outcome pairs used in each experiment. what's unclear is whether we
want to further expose which test cases didn't yield estimates or even
further to expose the actual effect estimate, CI, case counts, etc. My
immediate reaction is that showing everything will make it too easy to
encourage posthoc rationalization that may lead us astray in inappropriate
directions, but a part of me thinks that the posthoc rationalization may be
appropriate in this context of methods evaluation if we see trends that
either highlight limitations of a given method or identify testcases that
are consistently challenging for all methods to allow further drilldown.
4) Could we show the ROC plot, in addition to the calibration plot?
Perhaps that may make it convenient to identify decision thresholds with
acceptable sensitivity/specificity thresholds?
5) The 'True effect size' dropdown didn't seem to function on my end, but I
like the idea of restricting on that.
6) Could we add 'number of estimates' to the table so you know when you are
looking at any performance measures, you can assess how precise the
performance measure may be.
This is really great. Thank you for leading the community in this exciting