Hi, Vojetch, I just updated the âno rowbounds precedingâ gist, found a bug. The use of ROWS UNBOUNDED PRECEDING is sooo nice! Let me explain:
Consider the 2 overlapping date ranges:
START_DATE END_DATE
1/1/2010 1/15/2010
1/12/2010 1/30/2010
By arranging the start dates in order with a row_number, and unioning this result with the end dates (which this outer set also has row_number() applied), you are left with the following:
EVENT_DATE START_ORD OVERALL_ORD EVENT_TYPE
1/1/2010 1 1 -1
1/12/2010 2 2 -1
1/15/2010 null 3 1
1/30/2010 null 4 1
The event type is used to just make sure that dates that appear ont he same day will have the start_date appear in the result before the end_date on the same day.
To find an era end means that there is no period of time where there is an âopenâ start-end period. Think of it like a parenthesis expression. This one looks like this:
(
(
)
)
The second event date opens an erra that extends the end date of the first eventâs extent.
So, how can the above table be used? well we see that when you have a number of STARTS that is equal to 2 * the number of EVENTS, that means that every start MUST have been closed by an end. So to find the end date of the era above, youâd look for the row where START_ORD * 2 = OVERALL_ORD. But, there are NULLS in the row that we want (the last row signals the end of an era, because that that point, all the starts have been ended). But how do we get nulls to the the MAX(START_ORD) of the prior rows? Enter ROWS UNBOUNDED PRECEDING.
Given the following result set above, by applying this window function to that query:
select PERSON_ID, EVENT_DATE, START_ORDINAL, OVERALL_ORD, EVENT_TYPE,
MAX(START_ORDINAL) OVER (PARTITION BY PERSON_ID ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as FULL_START_ORDINAL
We get this:
EVENT_DATE START_ORD OVERALL_ORD EVENT_TYPE FULL_START_ORD
1/1/2010 1 1 -1 1
1/12/2010 2 2 -1 2
1/15/2010 null 3 1 2
1/30/2010 null 4 1 2
And now, we can see that the last row is where 2 * FULL_START_ORD = OVERALL_ORD and gives us the END_DATE of the era. In the demo sql, youâll see how I apply all the end dates I locate (this example only has 1 era end date) to the corresponding row, would look something like this:
START_DATE ERA_END
1/1/2010 1/30/2010
1/12/2010 1/30/2010
And finally, finding the MIN(START_DATE) grouping by ERA_END leads to:
START_DATE ERA_END
1/1/2010 1/30/2010
And this is the era defined by my two different exposures.
This is the simple era building logic. In the examples I give,I show how i roll up the drug exposures to the ingredient level, and then my row_number() calls are partitioned first by person then by the ingredient concept ID. Partitioning it that way will treat each individual ingredient as itâs own era fragment, generating all eras for all ingredients with one pass on the data! (well, maybe two).
Let me know if you have any other questions.
-Chris