I’m hereby announcing the formation of the Method Evaluation Task Force!
Background
When designing a study there are many different study designs to choose from, and many additional choices to make, and it is often unclear how these choices will affect the accuracy of the estimate.(e.g. If I match on propensity scores, will that lead to more or less bias than when I stratify? What about power?) The literature contains many papers evaluating one design choice at a time, but (to me) with unsatisfactory scientific rigor; often a method is evaluated on one or two exemplar study from which we cannot generalize, or by using simulations which have an unclear relationship with the real world. The OMOP Experiment was a first attempt at systematic empirical evaluation of method performance, from which we have learned many insights (mostly on how to better evaluate methods ).
Task force objectives
Develop the methodology for evaluating methods (for estimating population-level effects).
Use the developed methodology to systematically evaluate a large set of study designs and design choices.
Next steps
Call for collaborators (this post)
Some further refinement of the objectives (e.g. what gold standards to use, which methods to include in the evaluation)
Gather requirements for funding, and apply for funding if needed
Do the research
Call for collaborators
Let us know if you’re interested in actively participating in this research. Warning: this will be hard work with (probably) no pay. Either send me an e-mail, or respond to this post.
I would like to re-invest in OHDSI. I would like to join the task force and after understanding how you propose to go about this evaluation will make a time commitment. I do think a conceptual paper mapping epidemiologic design constructs to database epidemiology is needed. For example, we need to standardize terminology, provide rationale and example operationalization. One example I like to use is the baseline concept. In a clinical trial a full clinical evaluation is done at enrollment or at study index. Why do we use a baseline concept and how what purpose does it serve in relation to a clinical trial?
Maybe you have already done this but I have not found a paper mapping these concepts to database epidemiology. Some concepts for consideration:
Study period
Wash out
Baseline
Incident treatment or incident course of therapy vs. inception cohort
Incident disease can also be understood from a chronic disease epi or infectious disease epi perspective
Index date
Followup
Censoring
Etc
Validation and how information on measurement performance can be used to estimate “true prevalence” vs apparent prevalence. And how we encorporate measurement performance into our analytic framework. There are so many issues for database epidemiology that have not been pulled together into one paper or series that would provide as a tutorial with justification for how to implement a specific study design
The task force will focus on evaluation of population-level effect estimation, so not on prevalence estimation. The focus is on methods that try to estimate the causal effect of an exposure on an outcome (although I realize the term ‘causal’ is a loaded term).
The approach we’ll use is empirical evaluation, so using some gold standard set of exposure-outcome pairs where the true effect is known to measure how well methods perform. Even though I agree that the terms you mention need explicit definitions, I would prefer the task force focus on the evaluation aspect, and simply provide its own definitions (and source code) of the designs under evaluation.