CHARYBDIS script is really slow, seeking help

gabimaeztu · December 15, 2020, 5:52pm

Good afternoon,

We are carrying out the execution in a hospital of the CHARYBDIS (1.0) study and it’s been running for more than 30 hours. So far only 15 out of 105 queries have been completed. We know from the statistics of the database that the requests to the DB are not the problem (0.5s to 15s to resolve each).

The problem is the execution time of the R script.

Is it possible to speed up the process somehow?

Happy to debug or explore possible bottlenecks with your suggestions.
Thanks for your time,

krfeeney · December 16, 2020, 1:01am

Hi @gabimaeztu!

Welcome to the part of OHDSI that has a few warts Have we scared you yet?

I also have this problem on my databases. @anthonysena designed the package. He gave me the following advice: Build indices on the drug_source_concept_id and condition_source_concept_ids.

Why? CohortDiagnostics is the library that underpins this package. It’s greedy and looks across your source_concept_ids in CONDITION_OCCURRENCE and DRUG_EXPOSURE. This is essentially what RunDiagnostics is doing. It’s what leads to the Diagnostics Shiny Apps for COVID cohort definitions, Flu cohort definitions, the Stratification criteria we’ve created and any custom features we’ve defined.

I’m told this should really help.

If you continue to face problems, let us know. We are about to embark on Charybdis 2.0 which includes amending our current protocol with new questions that fit into the characterization framework. We will also be revisiting the Charybdis package code for optimization.

BTW, I forgot to ask… what dialect of SQL are you using?

Best,
Kristin

gabimaeztu · December 17, 2020, 7:21am

Thank you very much for your answer, I really appreciate it.

Zero scared and 100% committed!

We have already created the indexes and we have updated the package to version 1.4.0. Let’s see if this time the execution is faster. We will keep an eye on the database, but we didn’t saw performance problems there. We use Postgres and some Postgres extensions.

I just joined the 2.0 chat, so I’ll try to help there as much as I can,
Bests,

gabimaeztu · December 21, 2020, 10:43am

Hi @krfeeney

The execution finished this weekend, seems that the problem was an overselling and prioritization in the Hospitals clusters, everything runned much quicker during the weekend.

Thanks for your time and suggestions,