We have been utilizing Atlas to develop phenotype algorithms, and have noticed that queries can take a long time (>24 hours) to run on a population of ~40K patients stored in a Postgres database. I have made sure that the standard indexes are built, and did some additional debugging based on other forum posts.
In continuing to debug, I exported one of the long-running queries that was taking ~12 hours to run and ran from psql directly, and ran with EXPLAIN
. It appeared that there was a lot of time scanning one of the temp tables (qualified_events
) for event_id
and person_id
. I played with the generated SQL, and added the following after the creation of the QE temp table:
CREATE INDEX idx_qualified_events_evt ON qualified_events (event_id ASC);
CREATE INDEX idx_qualified_events_person ON qualified_events (person_id ASC);
The runtime went from ~12 hours to ~10 minutes.
I am wondering if anyone else has had similar observations, or tried a similar solution. Given that it appears to be addressing our issue, I am next wondering how to integrate this into our Atlas for further testing. I’ve tried to trace back where the SQL generation takes place, and was looking for guidance on where that might be (I’ll continue to search through the code in the interim). Thank you!