Proposal to remove a redundant field in the DRUG_EXPOSURE table

Christian_Reich · December 30, 2015, 11:51am

You have a penchant for putting the finger into the wound.

It’s actually worse. Because it’s not clear whether to count the first day or not: So, if I inject 50 mg Amphotericin B, is that from 1-Jan-2016 to 1-Jan-2016 or 2-Jan-2016? Please see an old debate on the subject here.

But I agree, we are not consistent. We need to fix that.

ericaVoss · December 30, 2015, 3:10pm

The way I have thought about this is if the the source data gave me DRUG_EXPOSURE_END_DATE I would populate it and/or if I got DAYS_SUPPLY I would populate it.

Most data sources only give DAYS_SUPPLY so we leave the END_DATE blank.

Chris_Knoll · December 30, 2015, 4:16pm

@Klaus,
Any reason you chose to drop drug_exposure_end_date vs. days supply? I was originally thinking that dropping days_supply would make sense (and have the ETL handle conversion of days supply to an end date) so that you only have the drug exposure start and end, but now I’m thinking that dropping end_date would end all the confusion about “Do I subtract 1 from the date or not for this specific type of date? Oh, it’s an drug exposure end date, so yes, as opposed to the exposure start date, which I would not…”

But I’m in total agreement with you, having 2 different fields that represent the same thing (the length of exposure) does lead to the possibility of arriving at different answers for the same question, so getting rid of one of these fields would be a great thing.

-Chris

Klaus · December 30, 2015, 4:34pm

@Chris_Knoll:
I agree with you and I could also live with dropping days_supply.
The reason for the preference of days_supply is that the majority of data sources (I’m aware of) don’t provide the start/end date approach but provide something like “date of event” and “days supplied”.

Chris_Knoll · December 30, 2015, 4:46pm

Hi, @ericaVoss,

The challenge here is that now a single CDM query won’t operate the same across different CDMs from different data sources. In one case you have to know the rules of the ETL and specify days supply, and in another case you have to know the rules of the other ETL and specify the end date. So what @Klaus is proposing would force us to choose and eliminate the confusion entirely. Unless we always did something like derive days supply such that it is never null and never fill in end_date, but that would be the same as just dropping end_date from the CDM.

-Chris

ericaVoss · December 30, 2015, 5:37pm

Okay, fair point @Chris_Knoll but some people are using it so taking it out should be considered carefully (just like with any CDM change).

For example, I think @amatcho with CPRD does make use of both fields - if CPRD provides the DAYS_SUPPLY she populates that but many DAYS_SUPPLY in CPRD need to be imputed and the imputed DAYS_SUPPLY are represented via the END_DATE.

When I’ve used the fields in my own programs I have considered both fields giving preference to DAYS_SUPPLY.

Mark_Danese · December 30, 2015, 5:41pm

I can’t see a reason to drop either one. This is a situation where guidance documents would be more helpful and would explain much of what exists in this and other threads. At the least, it might be worth linking some of these discussions to the documentation.

Chris_Knoll · December 30, 2015, 6:01pm

Thanks for the clarification, Erica.

Christian_Reich · December 30, 2015, 6:26pm

Friends:

Well, that would be an undocumented convention, and standardized tools have no idea that Amy would do it that way. But currently we have two fields that more or less mean exactly the same. So, I am with Klaus and Chris.

It’s not urgent to do anything, but if we do if-statements to find out wether one or the other field is populated we may as well get rid of one.

Vojtech_Huser · December 30, 2015, 7:24pm

Ad linking to documentation: A link as an example already exist in the measurement table specs (taking about overloading the time column).

I would propose:

CDM discussions always are in CDM Builders part of the forum.
table name is listed in the thread title

That way, people can search dynamically. Linking in wiki can be a lot of maintenance work.

Christian_Reich · December 30, 2015, 7:32pm

@Vojtech_Huser:

Actually, we have a WG and a list of all proposed changes here. I’ll add this one.

rkboyce · December 31, 2015, 11:22am

I don’t favor dropping either because we face a situation with drug dispensing data in the nursing home setting where the source data provides a prescription end date that has a different meaning then what can be inferred from days supply. If the drug_exposure table is supposed to closely match the source orders, dropping either field would cause a loss of information for our use case.

Specifically, because patients tend to reside in the home for long periods of time, it is often the case that an order is started with no stop date but with a days supply. The clinician later places a “stop order” indicating the date that the order should be stopped. There are no records of refill orders or holds (which occur frequently because of transfers to and from the hospital). Rather, as long as the Rx is active, dispensing continues. To calculate accurate drug_eras (ones that closely match actual administration), we have to consider both the end date (if available) and days supply fields in addition to other information about the patient’s actual time in the facility. For some kinds of studies, such detail might be avoided but we are working on patient predictive models where we think it matters.

Klaus · December 31, 2015, 5:20pm

If I understand you correctly, the Rx initially is recorded with days_supply. Sometimes there are undocumented refills and this will be indicated with the end_date field and this end_date is greater than start_date + days_supply – 1?
If my understanding is correct, what prevents you from substituting the original days_supply with end_date – start_date + 1?

Patrick_Ryan · December 31, 2015, 9:16pm

This is a great thread to hear everyone’s perspectives, so thanks all for
your valuable contributions.

My two cents:

I think its very important that we clearly separate verbatim information
that comes directly from a source database from derived information that
can be inferred from other elements in the source.

The original motivation for the DRUG_EXPOSURE table was to be the place to
store all verbatim information (with the specific convention to NOT infer
or derive information to populate all fields). Because different source
databases come with different elements about a drug exposure record, we
ended up with a collection of fields that seem redundant or highly related,
but in practice, most sources only use a small subset of those fields and
rarely contain all of the seemingly redundant field. As an example, most
administrative claims/pharmacy dispensing data will ONLY have drug exposure
start and days supply, whereas most EHR medication history records ONLY
have drug start and drug ends (there is no notion of ‘days supply’). We’ve
seen e-prescribing systems that capture prescriptions written with ONLY the
drug start and the number of allowable refills (sometimes with quantity).
I’ve also seen drug exposures from procedural administrations where truly
the only piece of information you have is just the drug start. In its
current form, the DRUG_EXPOSURE table accommodates all of these scenarios
without any information loss or transformation. Certainly if there are
other source elements that people have in their source data which are
required for analytical purposes which aren’t yet captured, we want to hear
about them.

In contrast to the DRUG_EXPOSURE table, which is intended to only contain
verbatim information, the DRUG_ERA and DOSE_ERA tables were standardized
constructs intended to be fully derived information. Because of exactly
the issues that everyone in this thread was raising - that different
sources have different elements which require different conventions for
defining information, such as ‘drug end’ - we wanted to develop one
structure for periods of exposure that could have a uniform definition
(even if the implementation at a source may vary depending on their source
data). The DRUG_ERA table was intended to allow standardized roll-up of
drug exposure records to the generic ingredient level and allow for
‘continuous periods of exposure’, defined as records with no more than a
30d gap between the inferred end of one record and inferred start of the
next. There are several implementations for how to derive DRUG_ERA from
DRUG_EXPOSURE based on typical scenarios (e.g. if you have claims with
dispensing date and days supply, or if you have EHR with just refills),
but ultimately the derivation should be source-specific and clearly and
transparently documented so that analysts can know what to expect when
using the DRUG_ERA table. The DOSE_ERA construct was the same idea as
DRUG_ERA, but rather than just being at the generic ingredient level,
DOSE_ERA was to contain periods of time with persistent exposure to a
constant dose of the drug; here, both the end date and the dose would have
to be a source-specific derivation, but the DRUG_STRENGTH table was added
as part of the OHDSI vocabularies to make the transformation a bit easier.

Currently, within the OHDSI community, we have many apps that make use of
the DRUG_ERA table but fewer that utilize the DOSE_ERA table. Largely I
see that as a chicken-egg problem…once people start seeing the value of
the data, they’ll build more apps, which will generate more value,
improving the standards and conventions that we all share, generating more
value, etc.

Happy New Years all!

Christian_Reich · January 2, 2016, 2:26am

@Patrick:

Happy New Year as well!

So, here is the situation:

On the one hand, we have a model of DRUG_EXPOSURE that contains the same information, as we define it, twice. More ore less:
drug_exposure_end_date is defined as “The end date for the current instance of Drug utilization. It is not available from all sources.”
days_supply is defined as “The number of days of supply of the medication as recorded in the original prescription or dispensing record.”
For all accounts and purposes, that is as identical as “Date of birth” and “Age”, one is relative, the other one absolute. and in the Age situation we decided to turn it into a birth date (at least year), and the ETL has to figure out how to do it.
On the other hand, we have two undocumented use cases:

Amy uses it for CPRD to distinguish provided with imputed end dates
Rich uses it for official end date vs the never ending refills used in practice at nursing homes.

So, we have an ambiguous semi-duplication, and becasue of it people start using it for purposes that are not standard (and therefore not useful for standardized analytics). My strong suggestion is that this may become one of the points to discuss on the 5th and decide what to do.

Klaus · January 2, 2016, 2:09pm

Thanks for sharing this and a Happy New Year as well. First of all, as an OMOP-newbie I don’t know all the historical motivations and conventions and sorry for the nuisance I may cause.

I just want to mention that I cannot use the DRUG_ERA table as it holds already adjusted, calculated information. At IMS we use methods to calculate ‘continuous periods of exposure’ with some more parameters which depend on the disease, drug area and client perspective. Thus, I really need something like days supplied either by the number of days or by an end date. But the meaning must be clearly defined. I can easily convert one into the other.

Yes, I’m aware of many, many fields which are not part of the CDM and I doubt that it would make sense to integrate them. They are very data source/country specific. We have different health care systems, different legal situations and different established country-specific questions and reports all over the world. As an example, we defined an intersection of fields just for three European countries. The intersection holds approximately only one-third of the original number of fields in each country.
If you really want to integrate all the fields out there, the CDM would get very complex, huge and unhandy to use.

My vision for OMOP is that we focus on the commonality between the patient data sources and enable the exceptional opportunity to plug standardized analytical methods to each data source in the OMOP format.
Every field which does not provide sufficient commonality should either not be converted at all (we can still handle country-specific reference data via *_source_concept_id) or is put into some to defined custom-fields which are not part of the ‘core OMOP’.

Christian_Reich · March 2, 2016, 11:22pm

@hripcsa’s comment:

I think there is a range of evidence available that we need to deal with, and we need to make sure it all fits in. I also worry if we make it too complicated.

Sometimes we only know that a drug was ordered at a certain time. No days supply or end date. So do you assume 0 days or 1 day or perhaps 30 days in the database?
We may know days supply but not the true end date.
We may know the ordered end date. (MD retracted the order.)
We may know that the patient really stopped (patient entered data, mems caps, etc.). This is less commonly known.

I think trying to encode all of this meticulously will be very confusing and not very useful. So I lean towards the fewest fields possible.

Christian_Reich · March 2, 2016, 11:27pm

@Christian_Reich’s response:

Agreed

Yes. Need to nail this, or leave it to the ETL, which would have to declare it somehow.

Right. The new end date could be the day the supply ends, or a new prescription came in (no stockpiling assumption) or we know it explicitly from an EMR or ordering system.

Right.

Rght.

That’s what this group would have to come up.

Christian_Reich · March 2, 2016, 11:28pm

@herrcerd’s comment:

I would 2nd George’s points - solution should be as abstract/flexible as possible ideally.

We see all of these issues he mentioned, probably to an extreme, in FAERS and clinical trial data (FAERS 2012Q4+ data only):

We never have days supply in FAERS
We may have cumulative dosage/units patient received
We may or may not have the drug start/stop date
– We may have only the treatment duration + units in lieu of dates (‘12 weeks’) – sometimes both!
– We may know that the drug was taken at an interval (q1d, q12h, etc)
We may know that the drug was stopped as of the time of report, but no specific dates.

Klaus · March 3, 2016, 4:35pm

Thanks for the replies.
I would like to summarize my opinion:

As others pointed out, there are many databases (of good quality) out there which do not have the days_supplied information. There are various methods to substitute this missing information either by parsing a dosage description field or by calculating medians of days_supplied or distances. However, the most powerful methods need the quantity supplied.

We cannot consider this field as a required field.
If this information is missing, the ETL process should indicate this by providing a 0 in order to leverage the above mentioned methods.
If the ETL provides an own guess, there must be reasons that this estimate is better than results from typical analytical methods.
What we regularly know are the original/intended days supplied (either prescribed or dispensed). Only in rare cases we have data sources which provide stop records. A stop record truncates the original/intended exposure period, i.e. the corrected exposure period may be smaller than the original one (in accordance to the doctor’s intention). I am not aware of an important practical use case where the actual exposure period is longer than the original one (we are not addressing compliance/adherence here just a change of the doctor’s intention).
In particular, the use case described here by @rkboyce is an important but dangerous one: An initial prescription is recorded together with days_supply for this prescription but many undocumented refills occur and are documented only by exposure_end_date.
If the drug_exposure table is abused to document some kind of drug_era periods, analytics will wrongly calculated doses.
We may have to address this problem in the future with additional fields. A rather complex alternative would be to distinguish two cases:
If the corrected information is smaller than the original one it represents a medication stop/change.
If the corrected information is bigger than the original one it represents refills.
It would be beneficial to have two fields “original intention” vs. “corrected intention” because it would give us information about medication stops or changes.
However, we must know whether both fields refer to this prescription record or to a sequence of hidden ones.
If we have both fields, we must be clear how to interpret them. My recommendation is to have the “original” information in the first place and the “corrected” as an additional option only if it differs from the original one.
The proposal may suggests the other way around: The default might be the corrected information, which probably means it holds the intended information if there is no difference. In this case it is important to note that the intended information must be provided if there is a difference. If the intended information is missing in that case the analytics will calculate wrong daily dosage information because it would be based on the wrong (changed) period.
- For me it is not important which field is the default as long as a difference is recorded as described above.