OHDSI Home | Forums | Wiki | Github

Is end_date = start_date + days_supply?

For admission, June 1 to June 3 would be 24 to 72 hours. We could call it “3 days” but then our length of stay (LOS) estimates will be 1 day bigger than everyone else’s.

George

@hripcsa:

Why 24? 48-72 hours. 2-3 days. Average 2.5 days, but if you want to round the days up you would end up with 3 days.

Arrive at 11:59pm, June 1, leave 12:01am June 3. Length of stay 24 hours, 2 minutes, recorded as June 1 to June 3, tallied by us as 3 days.

George

There is no “right” or “wrong” here. It is just about adopting a convention that works for the intended purpose. Below is a link to what AHRQ does for LOS for their national hospitalization data. They use LOS = 0 to indicate a “same day” admission. http://www.hcup-us.ahrq.gov/db/vars/los/nisnote.jsp

I have seen others use the discharge - admit + 1 formulation too, and this is what we tend to use. In this case the interpretation is more of a step function. You get 1 day as soon as you get to the hospital. You don’t get the second day until you stay past midnight. They you are on day 2. This also works nicely for drugs.

For ICU stays, where people have date and time stamps, it is more accurate to calculate the difference in time exactly in hours, and then divide by 24 if “days” is desired.

I don’t think we are calculating LOS in the CDM. To the extent we do, the dates should be there too. So, this seems to be a question about drugs and how to take a start date and days supplied and get an end date. In that case, we should take take the start date + days supplied - 1. Jan 2, 2010 + 4 days should cover someone until Jan 5, 2010.

:slight_smile: But that won’t be an average thing. There will be patients who will arrive 0.01 on 1-June and leave 23.59 3-June, making it 3 days minus 2 minutes. Together, they will average to 2 days.

I think we’re mixing three time periods in the discussion:

  1. The true time period, which we usually don’t know
  2. The time period as recorded in the source data, with the conventions used there
  3. The time period as recorded in the CDM

1 is irrelevant since we don’t know it. For every ETL we need to know the conventions of 2, and in this discussion it would be good if we could converge on a convention for 3, so we can recode the source data into this convention.

I see 3 viable encoding options for the CDM:

  1. end_date = start_date + duration, both start and end date are included (aka Chris’ solution, assuming a date refers to noon of the day)
  2. end_date = start_date + duration, end_date is not included (this is what is used in the Jerboa system in Europe)
  3. end_date = start+date + duration - 1, end_date is included

The choice between these 3 is fairly arbitrary, with the exception that option 1 allows you to distinguish between 0-day and 1-day durations, which may or may not make sense.

I think the points about 11:59pm to 12:02pm the following day are highlighting an issue with the granularity of CDM dates (by day, rather than by minute) rather than the logic of LOS/days supply. Assuming that we always receive dates of events indexed on the same hour of day, weather it be midnight, noon, or 3am, it’s my very strong opinion that dates be treated the same whether they denote a start or an end time index. Otherwise imagine this conversation:

Doctor: I’d like to record a date.
Record Keeper: Ok! What date?
Doctor January 12th, 2015
Record Keeper: Is that a the date that something began or ended?
Doctor: Ended.
Record Kepper: Then I’ll write that down as the 13th. Thank you!
Doctor: NOooooooooo! Don’t change it, what if I said the start date?
Record Keeper: Then you’re in luck! I’ll record it just as you told me: January 12th.

Mark: in this example, how many HOURS are we saying the person is covered from Jan 2, 2010 to Jan 5, 2010? Starting on Jan 2, 24 hours after is Jan 3, 24 hours after that is Jan 4, 24 Hours after that is Jan 5. I only count 3 days there. But you said that the days supply is 4 days. So I don’t see how this is the preferred way to calculate duration. Unless we want to say that ‘Jan 5th in this case is an end date so treat it as if it was Jan 6th for duration purposes’?

-Chris

I think it depends on how things are recorded. For admissions, we tend to have the real dates, so duration should be calculated accordingly. June 1 to June 3 is two days on average, June 1 to June 2 is one day on average, and June 1 to June 1 can be called 0 days or 1/3 day.

For drugs, we don’t tend to record the actual stop date, but I think we tend to calculate something. That’s why on this one, we feel we have a choice. “I took the drug from June 1 to June 3” usually means I took it for three days, not two. So perhaps drug usage is just different from other durations in the database.

George

In the world-of-days-only:
I would vote for the third option (in the last post by Martijn)

  • end_date = start+date + duration - 1, end_date is included


It may be easier to solve the problem in the hours-world first, and than abstract from it the solution for days-world

Assuming CDM6 with end_dt (end_date_time) (or populated in the time column of CDM5.0

The logic would be
duration = end_dt - start_dt

for example duration 25.4 hours
(and the answer in days would be rounded to 1 day).

For 36.4 hours the rounding would be 2 days. (if we insist on integer days)

This is a specific situation that has been bothering me:

In SCCS you need to specify the time on the drug. Say I’m on the drug from Jan-1 to Jan-3. If we follow end_date = start_date + duration, that means time on the drug is 3-1 = 2 days. (ie. offset = log(2))

But if I see an event on the 1st, would I attribute it to the drug exposure window? Probably yes. If the event is on the third, would I attribute it to the drug exposure window? Also probably yes. That means the actual time at risk was 3 days (the 1st, 2nd, and 3rd), not 2.

But maybe we should try and solve this problem at analysis time, not when storing the data in the CDM.

Martijn and friends:

Let’s wrap this up and decide. We know what the situation is. I would propose we Martijn version #3: Start date and end date are included, so duration is end_date-start_date + one. The “one” is really “<=1”. So, if you have a drug expsure from 1-Jan-2010 to 1-Jan-2010 you have a drug exposure of:

  • If you work in full days, it is 1-Jan - 1-Jan + 1 = 1 day.
  • If you think about it how much time it really means it would be something between 0 and one day, or on average 12 hours (because of the <=1). So, procedures would last half a day.
  • If you have date/time, it would work normally, because you can calculate the exact number of hours and minutes something took that happened on the 1-Jan-2010.

Anybody disagree? Silence is agreement.

If not, I will add it to the documentation with examples.

C

I am fine with this for drugs. Make sure you limit it to drugs, and even give the counter example of inpatient admissions where the convention is that duration = end_date - start_date, to emphasize that this is only for drugs. Or if you don’t want to give the other formula, at least make it clear that you only add a day for drugs. You know, even a device should not add a day because the dates are likely procedures (insertion and removal) and that duration is simple subtraction. Drugs are different because the patient picks up pills from a pharmacy and starts taking them.

I would not mention 12 hours as a justification because that wouldn’t be the average. E.g., you know that they started before they stopped, so 2/3 would be a better average. But don’t go there. Plus if it is one dose per day, then I would want to know the number of days that they took a dose and not worry about the fact that the first dose was a little late that day.

You also need to be explicit about the definition of the start_date. It is the first date that the patient took any pill. (It could have been the first day that the patient is taking a full dose.) This is fine and covers the situation where there is a loading dose.

Then the ETL’s job is to set the stop_date, if it is calculated, to your formula given that you know the duration. What if you have the actual stop date and duration and they don’t follow your formula? Should you force your stop date to be a day earlier to fit the duration?

Here is another wrinkle. Some CDMs will have timestamps, too. Some databases have actual times that the drug started and stopped (e.g., inpatient MAR). Then the timestamp’s day portion and the calculated stop_date may not match. E.g., if we know the patient took a qid drug for 10 days, and the timestamps say Jan 1 at 6pm to Jan 11 at 12 noon, the ETL will set the start date to Jan 1 and the stop date to Jan 10, so now the timestamp and lone date do not match. Is that ok?

George

Sorry, still waking up. Christian, is the documentation simply stating that if this is an outpatient drug and if you only have start_date and days_supply, then you should set the end_date to (start_date + days_supply -1)? And that future analysts will interpret outpatient drugs as lasting (end_date - start_date + 1)?

George

OK with this proposal - thanks. Essentially, I see us saying that if we only have resolution to the day, we’re counting days or portions thereof the patient took the drug. From the perspective of ananlysis, I wouldn’t mind a similar convention for admissions, diagnoses or other “exposures”: acknowledging that we don’t have the ability to distinguish between a day and half a day, I can foresee more cases where we’ll want to call it a day than no time at all.

I expect I am getting this wrong since I am the only one in this camp. I welcome corrections, even if off line. But here goes:

Length of stay will be one day greater (not half a day, but one day), on average, for databases that have resolution to the day than for databases that have times. So be careful not to compare the two. Sometime on June 1 to sometime on June 3 is two (not 2.5) days on average in reality and we will be calling it three days.

Unless you mean June 1 to June 1 is one day, and June 1 to June 2 is also one day (i.e., 1 is the minimum).

I guess we could invent a new concept of “maximum possible length of stay,” which would be three days for June 1 to June 3.

If you are aggregating results from two databases, one with times and the other with just days, then the times should be subtracted for that database, and the days should simply be subtracted (no additional day) for the other database but the minimum for the days subtraction should be .33 instead of 0 (i.e., .33 for June 1 to June 1, then 1 for June 1 to June 2, 2 for June 1 to June 3, …). Then you will have the same average LOS.

This is all different than drugs, which has other effects.

George

I don’t think we’re disagreeing about the potential for error here; if we’re disagreeing at all it’s on the risk. For the LOS case you describe (some time on June 1 to some time on June 3), if I don’t know time, I really don’t know if the average is 2 or 2.5 or 2.9 or 1.1. It depends on the times patients present to the ED, when hospital services are available, and why I’m doing the analysis (the billers would say, “Easy - 3 days”; the bed managers would throw up their hands about turnaround time, and as a clinician I’d probably really care about 1 vs few vs many).

So to the extent I’m arguing for any general LOS convention, it’s that a LOS of “0” is counter-intuitive, and analyses will need to special-case it all the time since the intended mathematical behavior is “not 0, but small”. Given that, I favor a convention that avoids the problem. We’re still left with some distortion in the “few days” range, which it seems we can’t avoid – we’re just picking what distortion we want to deal with by convention.

Please feel free to point out mistakes I’m making in my thought about this.

(As an aside, also happier “overestimating” in the drug case, since time-to-peak is typically shorter than elimination half life, so I can make a hand-waving argument for erring on the high side. :wink:)

Thanks.

I agree on the drug side.

I see what you mean on the LOS side, and yes, billing will maximize payment.

Just for fun, I did the analysis on our database. For inpatient admissions, I do have the times, but if I only had the day, what would be the mapping from day to the true answers. Here is what I got:

0 days (i.e., admit and discharge on same day) is really 0.20
1 day is really 1.1
2 days is slightly less than 2.1
3 (as you go on, it gets closer to the integer)

My .33 had assumed uniform distribution of time of arrival and true length of stay.

Do we even need to pick a convention for non-drugs? As you say, it depends on what you are trying to answer.

George

Ok, I think we all agree on this:

  • Start_date: the day when something started, e.g when the drug was prescribed or dispensed, or the admission date
  • End_date: the last day of the something (inclusive), e.g. the last day the person took the drug, or the discharge date

I think the remaining problem is mostly related to drug exposures. There are some that use the convention that if you have 3 days of exposure, you get 3 full days so say day 1,2,3. Other will say you started the exposure somewhere during day 1 and ends somewhere during day 4, so your exposure will be day 1,2,3,4.

I propose (in contrast to the proposal of @Christian_Reich ) that we use end_date = start_date + duration. This will make the logic for drugs and hospital visits the same (and in line with George’s empirical evidence :wink: ). It is also easier to remember and therefore less likely to go wrong.

My aha moment was that this is just the convention for storing the data. At analysis time, we may need to convert this to something else, e.g. for the SCCS I could recompute the end_date as : new_end_date = start_date + (end_date-start_date) - 1 just to make sure the time at risk equals the days_supply.(Or I could accept that days_at_risk > days_supply, or I could give .5 weight to start_date and end_date, etc…)

Martijn, despite my own internal logic regarding the -1 adjustment, I agree
your proposal makes logical sense as the overall convention. It is a bit
more future proof for cases where we have detailed time information.

This thread has been quiet for a while. I’m assuming that means everyone agrees with the latest proposal :wink:

@Christian_Reich : could you add to the CDM documentation that end_date = start_date + duration ?

t