Thanks everyone for your input! Unfortunately, I still haven’t resolved my problem. Here’s what I did:
PGTune
Following @admin 's suggestion, I went to PGTune and created this configuration:
# DB Version: 13
# OS Type: windows
# DB Type: dw
# Total Memory (RAM): 58 GB
# CPUs num: 16
# Data Storage: hdd
ALTER SYSTEM SET
max_connections = '40';
ALTER SYSTEM SET
shared_buffers = '14848MB';
ALTER SYSTEM SET
effective_cache_size = '44544MB';
ALTER SYSTEM SET
maintenance_work_mem = '2047MB';
ALTER SYSTEM SET
checkpoint_completion_target = '0.9';
ALTER SYSTEM SET
wal_buffers = '16MB';
ALTER SYSTEM SET
default_statistics_target = '500';
ALTER SYSTEM SET
random_page_cost = '4';
ALTER SYSTEM SET
work_mem = '23756kB';
ALTER SYSTEM SET
min_wal_size = '4GB';
ALTER SYSTEM SET
max_wal_size = '16GB';
ALTER SYSTEM SET
max_worker_processes = '16';
ALTER SYSTEM SET
max_parallel_workers_per_gather = '8';
ALTER SYSTEM SET
max_parallel_workers = '16';
ALTER SYSTEM SET
max_parallel_maintenance_workers = '4';
I ran the commands, restarted the server, and verified the settings took hold using SHOW
. However, speed did not increase.
ANALYZE
I ran ANALYZE
on all the tables the code touched. No difference in speed.
Extra indices
I noticed the indices provided with the DLL only index ID fields. I added extra indices for DATE fields, since these are used heavily in the query that is causing the problem, for example
SET SEARCH_PATH TO synpuf;
CREATE INDEX idx_observation_period_start_date ON observation_period (observation_period_start_date ASC);
CREATE INDEX idx_observation_period_end_date ON observation_period (observation_period_end_date ASC);
This increased the speed of most queries, but I’m still stuck with the one query.
Problematic query
The problem I have is that one query never finishes. It is a query belonging to this heart failure cohort definition. It is running for over 12 hours now, while the other cohort definitions I ran took minutes at most. I’ve pasted the offending query below.
Some things I noticed:
- The concept set used only contains 111 concept, so not very heavy at all.
- Despite all the settings mentioned above, the PostgreSQL server process is only using 1 core, and 100MB of memory, while running this query.
Any thoughts on how to proceed would be greatly appreciated!
CREATE TEMP TABLE qualified_events AS
WITH primary_events(event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id) AS (
-- Begin Primary Events
SELECT P.ordinal AS event_id,
P.person_id,
P.start_date,
P.end_date,
op_start_date,
op_end_date,
cast(P.visit_occurrence_id AS BIGINT) AS visit_occurrence_id
FROM (
SELECT E.person_id,
E.start_date,
E.end_date,
row_number() OVER (
PARTITION BY E.person_id ORDER BY E.sort_date ASC
) ordinal,
OP.observation_period_start_date AS op_start_date,
OP.observation_period_end_date AS op_end_date,
cast(E.visit_occurrence_id AS BIGINT) AS visit_occurrence_id
FROM (
-- Begin Condition Occurrence Criteria
SELECT C.person_id,
C.condition_occurrence_id AS event_id,
C.condition_start_date AS start_date,
COALESCE(C.condition_end_date, (C.condition_start_date + 1 * INTERVAL '1 day')) AS end_date,
C.visit_occurrence_id,
C.condition_start_date AS sort_date
FROM (
SELECT co.*
FROM synpuf.CONDITION_OCCURRENCE co
JOIN Codesets cs
ON (
co.condition_concept_id = cs.concept_id
AND cs.codeset_id = 1
)
) C
-- End Condition Occurrence Criteria
) E
JOIN synpuf.observation_period OP
ON E.person_id = OP.person_id
AND E.start_date >= OP.observation_period_start_date
AND E.start_date <= op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0 * INTERVAL '1 day') <= E.START_DATE
AND (E.START_DATE + 0 * INTERVAL '1 day') <= OP.OBSERVATION_PERIOD_END_DATE
) P
)
-- End Primary Events
SELECT event_id,
person_id,
start_date,
end_date,
op_start_date,
op_end_date,
visit_occurrence_id
FROM (
SELECT pe.event_id,
pe.person_id,
pe.start_date,
pe.end_date,
pe.op_start_date,
pe.op_end_date,
row_number() OVER (
PARTITION BY pe.person_id ORDER BY pe.start_date ASC
) AS ordinal,
cast(pe.visit_occurrence_id AS BIGINT) AS visit_occurrence_id
FROM primary_events pe
JOIN (
-- Begin Criteria Group
SELECT 0 AS index_id,
person_id,
event_id
FROM (
SELECT E.person_id,
E.event_id
FROM primary_events E
INNER JOIN (
-- Begin Correlated Criteria
SELECT 0 AS index_id,
cc.person_id,
cc.event_id
FROM (
SELECT p.person_id,
p.event_id
FROM primary_events P
JOIN (
-- Begin Condition Occurrence Criteria
SELECT C.person_id,
C.condition_occurrence_id AS event_id,
C.condition_start_date AS start_date,
COALESCE(C.condition_end_date, (C.condition_start_date + 1 * INTERVAL '1 day')) AS end_date,
C.visit_occurrence_id,
C.condition_start_date AS sort_date
FROM (
SELECT co.*
FROM synpuf.CONDITION_OCCURRENCE co
JOIN Codesets cs
ON (
co.condition_concept_id = cs.concept_id
AND cs.codeset_id = 1
)
) C
-- End Condition Occurrence Criteria
) A
ON A.person_id = P.person_id
AND A.START_DATE >= P.OP_START_DATE
AND A.START_DATE <= P.OP_END_DATE
AND A.START_DATE >= (P.START_DATE + 1 * INTERVAL '1 day')
AND A.START_DATE <= (P.START_DATE + 365 * INTERVAL '1 day')
) cc
GROUP BY cc.person_id,
cc.event_id
HAVING COUNT(cc.event_id) >= 1
-- End Correlated Criteria
) CQ
ON E.person_id = CQ.person_id
AND E.event_id = CQ.event_id
GROUP BY E.person_id,
E.event_id
HAVING COUNT(index_id) = 1
) G
-- End Criteria Group
) AC
ON AC.person_id = pe.person_id
AND AC.event_id = pe.event_id
) QE
WHERE QE.ordinal = 1;