Yes, I realize many would consider this to be a well executed study, and I agree there are many studies that are published that are worse. However, in OHDSI we have been developing best practices that this study does not adhere to. We’ve started formulating these best practices here, although they’re far from complete. Let me discuss these best practices by comparing this paper to our own Keppra study:
OHDSI’s general principles are:
-
Transparency: others should be able to reproduce your study in every detail using the information you provide.
-
Prespecify what you’re going to estimate and how: this will avoid hidden multiple testing (fishing expeditions, p-value hacking). Run your analysis only once.
-
Validation of your analysis: you should have evidence that your analysis does what you say it does (showing that statistics that are produced have nominal operating characteristics (e.g. p-value calibration), showing that specific important assumptions are met (e.g. covariate balance), using unit tests to validate pieces of code, etc.)
Let’s start with transparancy:
Hick’s et al. do a good job of providing the READ codes used to find the outcome, and mention they define the outcome as ‘a diagnosis of incident lung cancer’, suggesting any first occurrence of one of these codes was considered an outcome, but they could have been more explicit. No exact definition is given of the exposures, and how the covariates are defined. In constrast, our paper includes the full protocol and analysis source code, so leaving no ambiguity.
Prespecify
It does not seem Hick’s analysis was preregistered anywhere, leaving us to wonder whether p-hacking took place, and also doesn’t allow us to monitor publication bias. In constrast, our Keppra protocol was registered on the OHDSI Wiki, including full specification of the analysis code.
Validation of your analysis
The assumption that using an active comparator guarantees no confounding is hard to defend, and nowhere do Hick’s et al check this assumption. Even though one could argue that negative controls might not be able to detect all forms of bias, that is no excuse not to include them as Hick’s et al did. Our Keppra study included 100 negative control outcomes, showing negligable residual confounding. Because Hick’s et al did not use propensity scores, they were not able to check whether covariate balance was achieved. In contrast, in the Keppra study, we observed balance on al >10,000 covariates (unfortunately not included in the paper).
Another form of validation would be to replicate this study in other databases. Hick’s et al solely relied on CPRD, in our paper we already included 7 databases.
The paper by Hick’s et al. does not go into how their analysis process was validated, and although I’m sure they’ve done a good job, there’s no way for me to check. There is no way any of us can see whether result is real, or simply the consequence of an error when copy-pasting values from one place to the other, or whether a programming error was made (again, negative controls would have been helpful). In contrast, our code is publicly available for review, and has many mechanisms in place to safeguard validity .
Finally, as argued at the OHDSI Symposium as well as our recent paper, isolated, one-off studies such as this one tend to be hard to reproduce because of study bias, publication bias, and p-hacking, as also covered in my points above. Preferably we would include lung cancer as one of the outcomes in our current LEGEND study on hypertension treatments.