Is end_date = start_date + days_supply?

schuemie · May 8, 2015, 4:29am

I feel very embarrassed for asking this n00b question, but I thought I knew the answer until I saw other people’s code.

Is the end_date included in a period?

So for example if I have a drug exposure with drug_exposure_start_date = 1-1-2000, and drug_exposure_end_date = 3-1-2000 (January 3rd), are people on the drug on the 1st and the 3rd? Does that mean the length of exposure is 3 days (the 1st, 2nd, and 3rd), or 2 days (3-1)?

(I checked, and the CDM specifications are not clear on this)

jon_duke · May 7, 2015, 11:59pm

Martijn,

I would treat the dates as inclusive. But certainly would be interested if
others are doing it differently.

Jon

schuemie · May 8, 2015, 12:53am

I may have posed the question poorly. From what I’ve seen most people consider the end date to be included. But the real question I was trying to ask is:

Is drug_era_end_date equal to drug_exposure_start_date + days_supply? If so, then the number of days exposed to the drug is days_supply + 1, which doesn’t sound right.

jon_duke · May 8, 2015, 1:28am

Did you mean drug_exposure_end_date? Era might be affected by the subsequent exposures.

If so, yes clinically a 10 day supply would start on the start date, then go 9 more days. So i agree with you, it should be start_date + (days_supply -1).

Chris_Knoll · May 8, 2015, 2:35am

Hi, Martijn,

Look at it this way in the simple case:

Drug Exposure: 1/1/2010; days Supply = 1 day.

dateadd(d,1/1/2010,1) = 1/2/2010
datediff(d,1/2/2010,1/1/2010) = 1.

Assume dates are all set to 12:00am. In this way you can see that the number of hours from 1/1/2010 0:0 - 1/2/2010 0:0 is only 24 hours.

So, the number of days when looking at dates this way is not days_supply + 1. It is always days_supply, as expected!

It also means that having any event date range that starts and ends on the same day is meaningless. How much time was someone exposed to a drug starting 1/1/2010 - 1/1/2010. Adding the implied hours to the dates:
1/1/2010 0:0 - 1/1/2010 0:0 = 0 seconds! It’s as if the drug exposure never existed!

That’s why I advocate that if you put any end dates to anything, put a minimum of 1 day, because our date fields do not have the granularity of hours/minutes so if you have to represent some quantum of time, days is the minimum granularity.

-Chris

Chris_Knoll · May 8, 2015, 2:36am

In this example, a 10 day supply starting 1/1/2010 would end 1/11/2010 because 1 + 10 = 11.

-Chris

schuemie · May 8, 2015, 6:38am

Hi Chris,

While I can certainly see your point, I was of the same mind as Jon until I started looking into some of your code. The epidemiologists I asked here would also expect the same thing: end_date = start_date + duration - 1.

I don’t think I really care which way, but we have to be very explicit about what we do.

bailey · May 8, 2015, 4:32pm

I think that yields a fencepost error. Put a little differently, I think we typically want to include the start date or the end date, but not both, if end is computed as start + duration. Here are some nice properties in the acute setting of using the start date and not the end date (for instance, one-time exposures tend to end up on the right date), but it wouldn’t surprise me if different data sources were most consistent with different approaches. FWIW, in our most common EHR - Epic - the recorded end date for an ordered or prescribed drug is the first day off the drug.

As a de facto standard, my preferences in order are:

start_date is the first date on the drug and end_date is the last date on the drug. If you only have start and duration, end = start + duration - 1.
Start_date is the first day on the drug and end_date is the first day off the drug, and we correct durations calculated as the difference.
We use whatever the source system gives us and don’t worry about fencepost errors as they’re probably not going to change most analyses significantly.

schuemie · May 11, 2015, 1:51am

I don’t think my OCD would agree with option 3, even though I think you’re right and in most cases it would not make a real difference

In defense of Chris’ solution: if we think of dates as points in time missing the time component, then end_date = start_date + duration. We could assume dates always refer to noon on that day, and this would allow us to frame all our logic in a way that doesn’t change when for some data sources we might have time information (and a future CDM will accommodate that information).

@Patrick_Ryan: I think you’ve been writing most of the SQL code that requires being explicit about this. What assumption have you used?

Christian_Reich · May 11, 2015, 3:00am

Friends:

Before Patrick gets to answer: I think we implicitly used Charlie’s version 1. To calculate a difference requires to add a day (drug_exposure_end_date - drug_exposure_start_date +1). That jives with the convention that if the end_date is missing assume it to be equal to the start date, resulting in a 1 day exposure. It also jives with procedures and observations having only a date, which means the assumed duration is 1 day.

But as soon as we settle it I’ll add it to the description.

hripcsa · May 11, 2015, 11:44am

Just thinking. Given June 1 to June 3.

For length of stay, this should be 2 days on average, not 3. If it is June 1 to June 1, then it is disturbing to call it 0 days. In fact, the mean should be 1/3, given only that you know the admission preceded the discharge. Calling June 1 to June 1 a 1-day-admission does not feel right.
For drug exposure, do we know how real systems set the end dates? I don’t know ours yet. If you go to the doctor, get a prescription, make your way to the pharmacy and start taking the drug June 1 to June 3: if it is once per day, then you probably took 3 doses for 3 total days. If it is qid, then you probably took one dose the first day and maybe dropped the last on the last day, making it 2 days total.

Therefore, I don’t think we can use (end –start + 1) across the database. For medications, it probably depends mainly on how most databases calculate or enter the end date.

George

Christian_Reich · May 11, 2015, 12:43pm

@george:

“Feel” is probably not the thing we should optimize, here. We need a strict convention.

And probably the convention would be that a duration of x days (no matter whether we define it inclusively or exclusively) would be anything between x-1 and x full 24 hour days. So, a procedure, which takes “a day” lasts between 0 and 24 hours, which is probably correct. A drug exposure of one day would be the same, 0 to 24 hours. It will also work if we used time.

That is the way hotels think. Everything is an overnight, nothing happens within one day. We are different. In fact, the night is pretty irrelevant. start_date = end_date would make no sense in that scenario.

I understand you want to use existing conventions. From what I hear, both inclusive and exclusive exist. We should just enforce whatever we decide.

hripcsa · May 11, 2015, 3:04pm

For admission, June 1 to June 3 would be 24 to 72 hours. We could call it “3 days” but then our length of stay (LOS) estimates will be 1 day bigger than everyone else’s.

George

Christian_Reich · May 11, 2015, 4:05pm

@hripcsa:

Why 24? 48-72 hours. 2-3 days. Average 2.5 days, but if you want to round the days up you would end up with 3 days.

hripcsa · May 11, 2015, 4:19pm

Arrive at 11:59pm, June 1, leave 12:01am June 3. Length of stay 24 hours, 2 minutes, recorded as June 1 to June 3, tallied by us as 3 days.

George

Mark_Danese · May 11, 2015, 4:52pm

There is no “right” or “wrong” here. It is just about adopting a convention that works for the intended purpose. Below is a link to what AHRQ does for LOS for their national hospitalization data. They use LOS = 0 to indicate a “same day” admission. http://www.hcup-us.ahrq.gov/db/vars/los/nisnote.jsp

I have seen others use the discharge - admit + 1 formulation too, and this is what we tend to use. In this case the interpretation is more of a step function. You get 1 day as soon as you get to the hospital. You don’t get the second day until you stay past midnight. They you are on day 2. This also works nicely for drugs.

For ICU stays, where people have date and time stamps, it is more accurate to calculate the difference in time exactly in hours, and then divide by 24 if “days” is desired.

I don’t think we are calculating LOS in the CDM. To the extent we do, the dates should be there too. So, this seems to be a question about drugs and how to take a start date and days supplied and get an end date. In that case, we should take take the start date + days supplied - 1. Jan 2, 2010 + 4 days should cover someone until Jan 5, 2010.

Christian_Reich · May 11, 2015, 6:24pm

But that won’t be an average thing. There will be patients who will arrive 0.01 on 1-June and leave 23.59 3-June, making it 3 days minus 2 minutes. Together, they will average to 2 days.

schuemie · May 11, 2015, 10:50pm

I think we’re mixing three time periods in the discussion:

The true time period, which we usually don’t know
The time period as recorded in the source data, with the conventions used there
The time period as recorded in the CDM

1 is irrelevant since we don’t know it. For every ETL we need to know the conventions of 2, and in this discussion it would be good if we could converge on a convention for 3, so we can recode the source data into this convention.

I see 3 viable encoding options for the CDM:

end_date = start_date + duration, both start and end date are included (aka Chris’ solution, assuming a date refers to noon of the day)
end_date = start_date + duration, end_date is not included (this is what is used in the Jerboa system in Europe)
end_date = start+date + duration - 1, end_date is included

The choice between these 3 is fairly arbitrary, with the exception that option 1 allows you to distinguish between 0-day and 1-day durations, which may or may not make sense.

Chris_Knoll · May 12, 2015, 2:32am

I think the points about 11:59pm to 12:02pm the following day are highlighting an issue with the granularity of CDM dates (by day, rather than by minute) rather than the logic of LOS/days supply. Assuming that we always receive dates of events indexed on the same hour of day, weather it be midnight, noon, or 3am, it’s my very strong opinion that dates be treated the same whether they denote a start or an end time index. Otherwise imagine this conversation:

Doctor: I’d like to record a date.
Record Keeper: Ok! What date?
Doctor January 12th, 2015
Record Keeper: Is that a the date that something began or ended?
Doctor: Ended.
Record Kepper: Then I’ll write that down as the 13th. Thank you!
Doctor: NOooooooooo! Don’t change it, what if I said the start date?
Record Keeper: Then you’re in luck! I’ll record it just as you told me: January 12th.

Mark: in this example, how many HOURS are we saying the person is covered from Jan 2, 2010 to Jan 5, 2010? Starting on Jan 2, 24 hours after is Jan 3, 24 hours after that is Jan 4, 24 Hours after that is Jan 5. I only count 3 days there. But you said that the days supply is 4 days. So I don’t see how this is the preferred way to calculate duration. Unless we want to say that ‘Jan 5th in this case is an end date so treat it as if it was Jan 6th for duration purposes’?

-Chris

hripcsa · May 12, 2015, 10:05am

I think it depends on how things are recorded. For admissions, we tend to have the real dates, so duration should be calculated accordingly. June 1 to June 3 is two days on average, June 1 to June 2 is one day on average, and June 1 to June 1 can be called 0 days or 1/3 day.

For drugs, we don’t tend to record the actual stop date, but I think we tend to calculate something. That’s why on this one, we feel we have a choice. “I took the drug from June 1 to June 3” usually means I took it for three days, not two. So perhaps drug usage is just different from other durations in the database.

George