It is the postgresql and NOT IN operator performance issue.
For example the condition_occurrence table has about 2M records and has indexes for condition_concept_id and person_id fields.
Execution time of following query will be about 20 minutes.
select distinct person_id,condition_concept_id
from condition_occurrence
where condition_concept_id != 0
and person_id not in ( select person_id
from condition_occurrence
where condition_concept_id != 0
group by person_id
having count(distinct condition_concept_id) = 1 )
If we will use left join and CTE instead of NOT IN, like this:
with a as
(
select person_id
from condition_occurrence
where condition_concept_id != 0
group by person_id
having count(distinct condition_concept_id) = 1
)
select distinct co.person_id,condition_concept_id
from condition_occurrence co
left join a on a.person_id = co.person_id
where condition_concept_id != 0 and a.person_id is null
execution time of following will be about 1 sec.
so NOT IN - 20 minute, CTE and left join - 1 sec.
Tagging @AnthonyMolinaro @fatemah