We’ve made some advances on era-fying drug exposure and condition occurrence records and wanted to share here the new technique that should be less pressure on temp db and other resources.
The new form as as follows:
create table #temp_condition_era
DISTKEY(person_id)
as
select person_id, condition_concept_id, min(start_date) as era_start_date, DATEADD(day,-30, max(end_date)) as era_end_date
from (
select person_id, condition_concept_id, start_date, end_date, sum(is_start) over (partition by person_id, condition_concept_id order by start_date, is_start desc rows unbounded preceding) group_idx
from (
select person_id, condition_concept_id, start_date, end_date,
case when max(end_date) over (partition by person_id, condition_concept_id order by start_date rows between unbounded preceding and 1 preceding) >= start_date then 0 else 1 end is_start
from (
select person_id, condition_concept_id, condition_start_date as start_date, DATEADD(day,30,coalesce(condition_end_date, condition_start_date)) as end_date
FROM cdm_schema.condition_occurrence
) DT
) ST
) GR
group by person_id, condition_concept_id, group_idx
Note this is in a redshift syntax where i’m taking the results of this query and putting it into a temp table and distributing on personid.
You should be able to take the above query and make your own modifications to work in your schema and RDBMS, but the main point where the records to erify comes from is in the middle subquery DT:
select person_id, condition_concept_id, condition_start_date as start_date, DATEADD(day,30,coalesce(condition_end_date, condition_start_date)) as end_date
FROM cdm_schema.condition_occurrence
You can select any source records that have start/end dates you want, but note in this example we’re selecting person and condition_concept_id so we need to partition by person_id, condition_concept_id in the places we are doing the over
clause.
This should be much faster and more memory efficient to do it this way. Please let me know if you have any questions.