This is a great thread to hear everyone’s perspectives, so thanks all for
your valuable contributions.
My two cents:
I think its very important that we clearly separate verbatim information
that comes directly from a source database from derived information that
can be inferred from other elements in the source.
The original motivation for the DRUG_EXPOSURE table was to be the place to
store all verbatim information (with the specific convention to NOT infer
or derive information to populate all fields). Because different source
databases come with different elements about a drug exposure record, we
ended up with a collection of fields that seem redundant or highly related,
but in practice, most sources only use a small subset of those fields and
rarely contain all of the seemingly redundant field. As an example, most
administrative claims/pharmacy dispensing data will ONLY have drug exposure
start and days supply, whereas most EHR medication history records ONLY
have drug start and drug ends (there is no notion of ‘days supply’). We’ve
seen e-prescribing systems that capture prescriptions written with ONLY the
drug start and the number of allowable refills (sometimes with quantity).
I’ve also seen drug exposures from procedural administrations where truly
the only piece of information you have is just the drug start. In its
current form, the DRUG_EXPOSURE table accommodates all of these scenarios
without any information loss or transformation. Certainly if there are
other source elements that people have in their source data which are
required for analytical purposes which aren’t yet captured, we want to hear
about them.
In contrast to the DRUG_EXPOSURE table, which is intended to only contain
verbatim information, the DRUG_ERA and DOSE_ERA tables were standardized
constructs intended to be fully derived information. Because of exactly
the issues that everyone in this thread was raising - that different
sources have different elements which require different conventions for
defining information, such as ‘drug end’ - we wanted to develop one
structure for periods of exposure that could have a uniform definition
(even if the implementation at a source may vary depending on their source
data). The DRUG_ERA table was intended to allow standardized roll-up of
drug exposure records to the generic ingredient level and allow for
‘continuous periods of exposure’, defined as records with no more than a
30d gap between the inferred end of one record and inferred start of the
next. There are several implementations for how to derive DRUG_ERA from
DRUG_EXPOSURE based on typical scenarios (e.g. if you have claims with
dispensing date and days supply, or if you have EHR with just refills),
but ultimately the derivation should be source-specific and clearly and
transparently documented so that analysts can know what to expect when
using the DRUG_ERA table. The DOSE_ERA construct was the same idea as
DRUG_ERA, but rather than just being at the generic ingredient level,
DOSE_ERA was to contain periods of time with persistent exposure to a
constant dose of the drug; here, both the end date and the dose would have
to be a source-specific derivation, but the DRUG_STRENGTH table was added
as part of the OHDSI vocabularies to make the transformation a bit easier.
Currently, within the OHDSI community, we have many apps that make use of
the DRUG_ERA table but fewer that utilize the DOSE_ERA table. Largely I
see that as a chicken-egg problem…once people start seeing the value of
the data, they’ll build more apps, which will generate more value,
improving the standards and conventions that we all share, generating more
value, etc.
Happy New Years all!