Hello,
During our first runs of TxPath on a full dataset, we ran against the problem that the execution times are very long.
We are aware the TxPath tool is still under development and improvements are being made.
On a cohort of about 50k patients, the execution times are between 2 to 10 hours. We have used a set of ~20 often occurring ingredients (more than a third of the patients is expected to have used at least one of these drugs).
Database: Postgres.
Analysis of the queries showed that there is one query that takes a lot of time. This query processes data from the pnc_tmp_ptsq_ct
table and inserts it into pnc_tmp_ptstg_ct
. The description of this query is (runPanaceaStudy.sql, line 116):
-- insert from #_pnc_ptsq_ct ptsq into #_pnc_ptstg_ct (remove same patient/same drug small time window inside large time window. EX: 1/2/2015 ~ 1/31/2015 inside 1/1/2015 ~ 3/1/2015)
-- use single concept combo and avoid duplicate combo for the same concept if there's multiple single combo for same concept by min() value
We suspect it has something to do with those temporary tables that have no indexes.
What were typical study execution times for the “Characterizing Treatment Pathways at Scale” study? And what database was used? (@jon_duke)
Does anyone have experience with running TxPath on Postgres?
Or does anyone have suggestions how to speed up the study execution?
Thanks!