Hi,

@hripcsa (looping you in as one of the paper’s authors)

Was wondering if we could get some advice on the p-value adjustment in this paper “A statistical methodology for analyzing co-occurrence data from a large sample”

Currently our research team has pulled out the co-occurrence of drug names and adverse events from over 600,000 discharge summaries to see if certain drug names disproportionately co-occur with certain adverse events (AE) terms. Each drug-AE pair has a co-occurrence matrix and each co-occurrence matrix has a χ2 statistic and p-value associated with it.

Co-occurrence matrix

Drug present | Drug absent | |
---|---|---|

Adverse event present | D+ve, AE+ve | D-ve, AE+ve |

Adverse event absent | D+ve, AE-ve | D-ve, AE-ve |

To further identify meaningful drug-ae pairs we’d like to implement the volume test adjustment followed by the p-value plot that was mentioned in the above paper.

However, we are not sure if the volume test adjustment ε(χ2) is the same as p in the plot of (1- p, Np) “where p is the adjusted p-value from the volume test and Np is the number of test statistics with a p-value greater than p”. If not, how can we derive the adjusted p value after calculating ε(χ2)?

The volume adjustment approach requires dividing the χ2 statistic by the number of observations (n). In our case, n is very large (>600,000) and this leads to very small χ2 values, and correspondingly very large p-values.

Another question we have is whether the volume test adjustment is suitable for the scenario we have just described, and whether we should use the conditional and fixed margin volume tests instead.

Thanks in advance

Hui Xing