Hello, @cce, I can answer questions related to CIRCE expressions.
In both bullet points, you’re referring to how the event duration is calculated. If there is no end date, then it uses start_date + 1 (which would create the interval that spans the entire day). When dealing with the time between, it includes the start date, but goes up to, but does not include, the end date. I think it is a bug that the drug criteria doesn’t attempt to use the days supply do infer an end date (it goes end_date, else start_date + 1). The code you saw that was applying the days_supply was when it was building a custom drug era. But, to be sure, both cases should be applying the days_supply in the event there is no end date. Personally, I wish that the THEMIS convention was that the ETL must guarantee an end date in the record, and I would always just use that. But, to be more robust, it should try end_date -> days_supply -> start+1.
I’ve been giving some thought on how the ‘window criteria’ dates are handled, and also with an eye of creating test cases so that we can certify the correct behavior. In thiink the current implementation may have some minor problems, and this thread is a great place to discuss it.
So, let me give you the logic and reasoning behind how date offsets are used:
When we look for correlated criteria using a window (ie: having the event start between 90d before and 0 days before index or from 0 days after to 5 days after index). we’re simply adding/subtracting a number of days from the index event, and then looking for events that start between the start date and end date, based on the offset. Example:
index event is 1/1/2010 and you want to find an event that started between 0 days after and 7 days after. The dates we will use are 1/1/2010 + 0d (1/1/2010) and 1/1/2010 + 7d (1/8/2010). The query should look for the events as >= 1/1/2010 and < 1/8/2010 (because that will cover all time up to but not including 1/8/2010, for a grand total of 7 days).
If you look at the code, it doesn’t quite work that way (which I think there is an issue). The query will generate the following join condition:
AND A.START_DATE >= DATEADD(day,0,P.START_DATE) AND A.START_DATE <= DATEADD(day,7,P.START_DATE)
You can see the problem is that it uses <= dateadd(day,7,p.start_date)
as the upper bound, which will capture 1/8/2001. It should capture events up to but not including 1/8/2001. So, that’s a bug. But, when dealing with soemthing like ‘between 7 days before and 0 days after’, this logic (using <=
) will capture the index date properly because you’ll get an expression:
AND A.START_DATE >= DATEADD(day,0,P.START_DATE) AND A.START_DATE <= DATEADD(day,0,P.START_DATE)
which will properly capture the day of index. while if we used < dateadd(day,DATEADD(day,0,P.START_DATE)
, it wouldn’t match anything because there is no values that can be found where x >= 1/1/2001 and x < 1/1/2001.
However, while this captures 1/1/2001 exactly, if the data had a hour component like 1/1/2001 12:15pm
, this event would not be captured, (because 12:15pm is not contained with a timewindow between 1/1/2001 (which infers 12:00am) and 1/1/2001 (again, inferring 12:00am). This wasn’t a problem when the CDM didn’t capture hours/minutes, but it does turn into a problem when it does.
So, with all that in mind, let’s talk about how I’d like to see the mechanics of the window criteria work:
I’d like to be able to state the window of time as from x days before to y days after
, where x days before means index - x days
and y days after means index + y days
.
I think this makes it easier to understand: i want to say the time period from now to 1 week later (7 days) and ‘now’ is 1/1/2010, then the timespan covers all time starting at 1/1/2010 and ends at 1/8/2010 (but does not include 1/8/2010. That’s why I think the current bug is that our upper bound after adding the offset is using <=
instead of <
.
Unfortunately, the behavior we’ve been teaching people is that if they just want to include the same day in their index, we’ve been telling them to do things like 7 days before and 0 days after index
. If we make the upper bound behave as ‘exclusive’, then the proper way to indicate to include the index is 7 days before and 1 day after index
. (such that if the index is 1/1/2010, then 1 day after is 1/2/2010, and the interval will end with < 1/2/2010, which includes all time in 1/1/2010).
So, part of my hesitation with dealing with this issue is the current pattern of behavior that people have with the tool. But I think it is much more important that people get the right results. So, let’s discuss the specific issue of defining those window of dates and how that should translate into SQL logic, and if we can all come to a consensus about how it should operate, I can adjust the code accordingly and create tests to ensure that behavior is performed.