With five CDM database and any results of each database, how can I calculate the results of a big database consist of five database?

pandamiao · December 3, 2019, 8:32am

dear all，
I am an OHDSI CDM user and a researcher, but now I have a problem. For example, I have five CDM database, each has 10000 people. But because of some reason, I can only calculate the corresponding parameters of each database separately, I have no right to put 50000 people of these five databases together to calculate the parameters of the total. So my problem is in this situation if I want the statistics results of the 50000 people when I can only calculate the separate results of five databases, can I do that? such as Analysis of Variance （ANOVA）

Best regards
thank you very much！

Christian_Reich · December 3, 2019, 12:07pm

Why can’t you pool the 50,000 into one database, @pandamiao?

SCYou · December 3, 2019, 11:05pm

@pandamiao Though I’m not sure I fully understand what you want, we’re developing something similar to that. Because the protocol has not been developed, I cannot tell you the details. But it will be announced at the OHDSI Korea symposium, and then I can tell you more details. Still, it’s not relevant with ANOVA.

schuemie · December 4, 2019, 5:55am

For something as simple as an ANOVA, couldn’t you just compute the 2x2 table (aggregate statistics, so shareable) per database, and then sum them across databases, computing the ANOVA on the combined 2x2 table?

David_Madigan · December 4, 2019, 12:15pm

or, more generally, just do a simple inverse variance-weighted meta-analysis

AsiyahFDA · December 4, 2019, 2:07pm

@David_Madigan For “inverse variance-weighted meta-analysis”, you are assuming that there is no heterogeneous among all databases. @pandamiao, is that the case?

pandamiao · December 9, 2019, 2:02am

yeah，it is a very complex problem @Christian_Reich

pandamiao · December 9, 2019, 2:04am

thank you very much @SCYou, hope for your information.

pandamiao · December 9, 2019, 2:17am

@schuemie thank you very much. Because ANOVA is just an example, and there are maybe some other questions. For example, the variables in three databases don’t satisfy the conditions of ANOVA such as Conforming to the normal distribution when I want to compare the difference of therapeutic effects among three drugs. So in this situation, how can I get the same statistical results as the pooled database? thank you very much.

pandamiao · December 9, 2019, 2:28am

thank you very much @David_Madigan. as far as I know, meta_analysis can be used for the published papers, but I want to combine the five database statistical results before publishing. so in this situation does the meta_analysis work?

schuemie · December 9, 2019, 7:54am

There are currently two options for combining evidence when you cannot share person-level data between databases:

For simple analyses such as ANOVA, just compute counts per database and sum these for the overall model.
For more complicated analyses such as a Cox proportional hazards model, compute the estimate (and standard error) per database, and combine using some form of meta-analysis (preferably one for random effects, since in observational research there is likely always some heterogeneity).

When counts are low, the second approach might run into problems because the normality assumption on the likelihood doesn’t hold. We’re developing a solution for that, which is what @SCYou mentioned.