OHDSI Home | Forums | Wiki | Github

Is end_date = start_date + days_supply?

In the world-of-days-only:
I would vote for the third option (in the last post by Martijn)

  • end_date = start+date + duration - 1, end_date is included


It may be easier to solve the problem in the hours-world first, and than abstract from it the solution for days-world

Assuming CDM6 with end_dt (end_date_time) (or populated in the time column of CDM5.0

The logic would be
duration = end_dt - start_dt

for example duration 25.4 hours
(and the answer in days would be rounded to 1 day).

For 36.4 hours the rounding would be 2 days. (if we insist on integer days)

This is a specific situation that has been bothering me:

In SCCS you need to specify the time on the drug. Say I’m on the drug from Jan-1 to Jan-3. If we follow end_date = start_date + duration, that means time on the drug is 3-1 = 2 days. (ie. offset = log(2))

But if I see an event on the 1st, would I attribute it to the drug exposure window? Probably yes. If the event is on the third, would I attribute it to the drug exposure window? Also probably yes. That means the actual time at risk was 3 days (the 1st, 2nd, and 3rd), not 2.

But maybe we should try and solve this problem at analysis time, not when storing the data in the CDM.

Martijn and friends:

Let’s wrap this up and decide. We know what the situation is. I would propose we Martijn version #3: Start date and end date are included, so duration is end_date-start_date + one. The “one” is really “<=1”. So, if you have a drug expsure from 1-Jan-2010 to 1-Jan-2010 you have a drug exposure of:

  • If you work in full days, it is 1-Jan - 1-Jan + 1 = 1 day.
  • If you think about it how much time it really means it would be something between 0 and one day, or on average 12 hours (because of the <=1). So, procedures would last half a day.
  • If you have date/time, it would work normally, because you can calculate the exact number of hours and minutes something took that happened on the 1-Jan-2010.

Anybody disagree? Silence is agreement.

If not, I will add it to the documentation with examples.

C

I am fine with this for drugs. Make sure you limit it to drugs, and even give the counter example of inpatient admissions where the convention is that duration = end_date - start_date, to emphasize that this is only for drugs. Or if you don’t want to give the other formula, at least make it clear that you only add a day for drugs. You know, even a device should not add a day because the dates are likely procedures (insertion and removal) and that duration is simple subtraction. Drugs are different because the patient picks up pills from a pharmacy and starts taking them.

I would not mention 12 hours as a justification because that wouldn’t be the average. E.g., you know that they started before they stopped, so 2/3 would be a better average. But don’t go there. Plus if it is one dose per day, then I would want to know the number of days that they took a dose and not worry about the fact that the first dose was a little late that day.

You also need to be explicit about the definition of the start_date. It is the first date that the patient took any pill. (It could have been the first day that the patient is taking a full dose.) This is fine and covers the situation where there is a loading dose.

Then the ETL’s job is to set the stop_date, if it is calculated, to your formula given that you know the duration. What if you have the actual stop date and duration and they don’t follow your formula? Should you force your stop date to be a day earlier to fit the duration?

Here is another wrinkle. Some CDMs will have timestamps, too. Some databases have actual times that the drug started and stopped (e.g., inpatient MAR). Then the timestamp’s day portion and the calculated stop_date may not match. E.g., if we know the patient took a qid drug for 10 days, and the timestamps say Jan 1 at 6pm to Jan 11 at 12 noon, the ETL will set the start date to Jan 1 and the stop date to Jan 10, so now the timestamp and lone date do not match. Is that ok?

George

Sorry, still waking up. Christian, is the documentation simply stating that if this is an outpatient drug and if you only have start_date and days_supply, then you should set the end_date to (start_date + days_supply -1)? And that future analysts will interpret outpatient drugs as lasting (end_date - start_date + 1)?

George

OK with this proposal - thanks. Essentially, I see us saying that if we only have resolution to the day, we’re counting days or portions thereof the patient took the drug. From the perspective of ananlysis, I wouldn’t mind a similar convention for admissions, diagnoses or other “exposures”: acknowledging that we don’t have the ability to distinguish between a day and half a day, I can foresee more cases where we’ll want to call it a day than no time at all.

I expect I am getting this wrong since I am the only one in this camp. I welcome corrections, even if off line. But here goes:

Length of stay will be one day greater (not half a day, but one day), on average, for databases that have resolution to the day than for databases that have times. So be careful not to compare the two. Sometime on June 1 to sometime on June 3 is two (not 2.5) days on average in reality and we will be calling it three days.

Unless you mean June 1 to June 1 is one day, and June 1 to June 2 is also one day (i.e., 1 is the minimum).

I guess we could invent a new concept of “maximum possible length of stay,” which would be three days for June 1 to June 3.

If you are aggregating results from two databases, one with times and the other with just days, then the times should be subtracted for that database, and the days should simply be subtracted (no additional day) for the other database but the minimum for the days subtraction should be .33 instead of 0 (i.e., .33 for June 1 to June 1, then 1 for June 1 to June 2, 2 for June 1 to June 3, …). Then you will have the same average LOS.

This is all different than drugs, which has other effects.

George

I don’t think we’re disagreeing about the potential for error here; if we’re disagreeing at all it’s on the risk. For the LOS case you describe (some time on June 1 to some time on June 3), if I don’t know time, I really don’t know if the average is 2 or 2.5 or 2.9 or 1.1. It depends on the times patients present to the ED, when hospital services are available, and why I’m doing the analysis (the billers would say, “Easy - 3 days”; the bed managers would throw up their hands about turnaround time, and as a clinician I’d probably really care about 1 vs few vs many).

So to the extent I’m arguing for any general LOS convention, it’s that a LOS of “0” is counter-intuitive, and analyses will need to special-case it all the time since the intended mathematical behavior is “not 0, but small”. Given that, I favor a convention that avoids the problem. We’re still left with some distortion in the “few days” range, which it seems we can’t avoid – we’re just picking what distortion we want to deal with by convention.

Please feel free to point out mistakes I’m making in my thought about this.

(As an aside, also happier “overestimating” in the drug case, since time-to-peak is typically shorter than elimination half life, so I can make a hand-waving argument for erring on the high side. :wink:)

Thanks.

I agree on the drug side.

I see what you mean on the LOS side, and yes, billing will maximize payment.

Just for fun, I did the analysis on our database. For inpatient admissions, I do have the times, but if I only had the day, what would be the mapping from day to the true answers. Here is what I got:

0 days (i.e., admit and discharge on same day) is really 0.20
1 day is really 1.1
2 days is slightly less than 2.1
3 (as you go on, it gets closer to the integer)

My .33 had assumed uniform distribution of time of arrival and true length of stay.

Do we even need to pick a convention for non-drugs? As you say, it depends on what you are trying to answer.

George

Ok, I think we all agree on this:

  • Start_date: the day when something started, e.g when the drug was prescribed or dispensed, or the admission date
  • End_date: the last day of the something (inclusive), e.g. the last day the person took the drug, or the discharge date

I think the remaining problem is mostly related to drug exposures. There are some that use the convention that if you have 3 days of exposure, you get 3 full days so say day 1,2,3. Other will say you started the exposure somewhere during day 1 and ends somewhere during day 4, so your exposure will be day 1,2,3,4.

I propose (in contrast to the proposal of @Christian_Reich ) that we use end_date = start_date + duration. This will make the logic for drugs and hospital visits the same (and in line with George’s empirical evidence :wink: ). It is also easier to remember and therefore less likely to go wrong.

My aha moment was that this is just the convention for storing the data. At analysis time, we may need to convert this to something else, e.g. for the SCCS I could recompute the end_date as : new_end_date = start_date + (end_date-start_date) - 1 just to make sure the time at risk equals the days_supply.(Or I could accept that days_at_risk > days_supply, or I could give .5 weight to start_date and end_date, etc…)

Martijn, despite my own internal logic regarding the -1 adjustment, I agree
your proposal makes logical sense as the overall convention. It is a bit
more future proof for cases where we have detailed time information.

This thread has been quiet for a while. I’m assuming that means everyone agrees with the latest proposal :wink:

@Christian_Reich : could you add to the CDM documentation that end_date = start_date + duration ?

t