How exactly are eras generated? (e.g., Rx vs. Dx)

Vojtech_Huser · April 16, 2015, 7:39pm

The era tables are very convenient for analysis. However, I have some questions about how they are generated (before I can use them well in my analysis)

The documentation says:

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

 Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Considering diagnoses:
If someone has repeatedly diagnosis of osteoporosis, I abstract an era lasting 5 years.
However, how is it done for urinary track infection (which usually lasts weeks)?
Is it the same rule for all diagnoses or is it diagnosis specific?
How do I find the gap for osteoporosis?

Considering drugs:
The documentation says:

The Gap Days determine how much drug-free days are tolerated between two Drug Exposures, without interrupting a Drug Era. Typically, this is set to 30, but could be different for Drugs that are very acute or chronic in their administration or for whom the daily dosing is variable.

Similar questions: Are the rules analogous to diagnoses? In this case, the gap seem to be a function of the drug. How do I find the rules used for ‘atenolol’? (for CDM4, MDCR dataset in IMEDS Research Lab?).
Can the sites have different era rules? (or there is a prevailing set of rules re-used by many)?

jon_duke · April 16, 2015, 9:27pm

This is a great question. Patrick and I sat for quite a while one evening
in Sweden poring over how to set up eras for conditions, some clearly acute
and some clearly chronic, and some in between. I don’t believe drugs have
been considered using similar rule variations.

Patrick?

Christian_Reich · April 16, 2015, 11:01pm

Gents:

Why couldn’t the data tell that? We could for each product generate the distribution of distance between individual admistrations (in case of drug) or diagnoses (in case of condition) and try to infer what a good persistence window would be. This could be improved by finding out (e.g. from the label) whether e.g. a “chronic” drug a drug is interventional (like pain meds, taken when it aches) or to control a certain condition (anitdiabetics, coagulants, immune suppressants). Likewise, for conditions, you could use the hierarchy to improve by comparing closely linked conditions to each other, who are likely on teh same scale of acuteness.

What did you have in mind?

hripcsa · April 16, 2015, 11:15pm

I know Jon Duke looked at this, and this was one of Adler Perotte’s research areas. George

aperotte · April 16, 2015, 11:45pm

Hi all,

Yes, I am currently exploring the use of hidden markov models for inferring
condition eras and drug eras. The hope is that we could learn
condition/drug specific parameters that would both characterize the
chronicity of the condition/drug and allow for inference of condition/drug
eras that are robust to noise.

There are limitations to this approach, but we think it’s worthwhile
exploring. Any thoughts?

Christian_Reich · April 17, 2015, 12:16am

Adler:

Sounds like a cool idea. But do we have any idea of truth? Take Warfarin, which should be taken non-stop forever (till the atrial fibrillation gets taken care of), but can we really make the persistence window infinite between two successive records? Maybe the patients did interrupt the treatment? How do you intent to measure the quality of the model?

aperotte · April 17, 2015, 1:18am

Hi Christian:

We can apply knowledge in various way to such a model. For example, fixing
certain parameters can enforce the constraint that once an era begins it
continues forever. Alternatively, prior distributions can be employed such
that it’s not guaranteed but very probable that once an era begins it is
likely to continue forever (accounting for lapses in treatment). In the
first iteration, however, it would be interesting to see if such properties
are learned automatically from the data without enforcing knowledge-based
constraints.

The quality of the model can be evaluated by providing ground truth for
some set of conditions/drugs for a subset of patient visits, but this is an
admittedly expensive way of measuring the quality of these models.

Chris_Knoll · April 20, 2015, 3:07pm

In case anyone is looking for a technical implementation of drug era building logic, I’ve submitted a series of gists that can be found at:

Specifically for v5:

gist.github.com

https://gist.github.com/chrisknoll/a18c8e15ff66f26fac84

Drug Era Builder v5

with cteDrugTarget (DRUG_EXPOSURE_ID, PERSON_ID, DRUG_CONCEPT_ID, DRUG_TYPE_CONCEPT_ID, DRUG_EXPOSURE_START_DATE, DRUG_EXPOSURE_END_DATE) as
(
-- Normalize DRUG_EXPOSURE_END_DATE to either the existing drug exposure end date, or add days supply, or add 1 day to the start date
	select d.DRUG_EXPOSURE_ID, d. PERSON_ID, c.CONCEPT_ID, d.DRUG_TYPE_CONCEPT_ID, DRUG_EXPOSURE_START_DATE, 
		COALESCE(DRUG_EXPOSURE_END_DATE, DATEADD(day,DAYS_SUPPLY,DRUG_EXPOSURE_START_DATE), DATEADD(day,1,DRUG_EXPOSURE_START_DATE)) as DRUG_EXPOSURE_END_DATE
	FROM DRUG_EXPOSURE d
		join CONCEPT_ANCESTOR ca on ca.DESCENDANT_CONCEPT_ID = d.DRUG_CONCEPT_ID
		join CONCEPT c on ca.ANCESTOR_CONCEPT_ID = c.CONCEPT_ID
		where c.VOCABULARY_ID = 'RxNorm'
		and c.CONCEPT_CLASS_ID = 'Ingredient'

This file has been truncated. show original

The gap is hard-coded to be 30d but there’s no reason why the logic couldn’t be altered to have a lookup table that had the gap window for the given conceptID.

-Chris

Vojtech_Huser · April 21, 2015, 3:49pm

The SQL code posted is a great addition to the discussion. (and a great contribution to the community!)

For Rx, I understand the logic uses ordering start and end dates by time, adding row_number, some quite smart logic with that and using a 30 day gap grace period.
That gap is the same for all drugs and not specific per drug class or drug.

This makes this script work in databases where row_number() exists but doesn’t support ROWS UNBOUNDED PRECEDING

What is ROWS UNBOUNDED PRECEDING feature exactly?
Does RedShift or Netezza support ROWS UNBOUNDED PRECEDING?

Does the script somehow loop over all RxNorm ingredients to coalesce all NDCs into a single era?
(or the stage table does the view by RxNorm Ingredient first and the loop is then not necessary)

(I am looking at this comment below)

– pretend that the data in #DRUG_EXPOSURE is the set or rows that match a certain descent concept ID.

Another key question is whether IMEDS Cloud Lab CCAE dataset was processed with this particular script or this is a newer-logic.

Are there similar scripts for diagnoses?

Chris_Knoll · April 22, 2015, 3:29am

Hi, Vojetch, I just updated the ‘no rowbounds preceding’ gist, found a bug. The use of ROWS UNBOUNDED PRECEDING is sooo nice! Let me explain:

Consider the 2 overlapping date ranges:

START_DATE	END_DATE
1/1/2010	1/15/2010
1/12/2010	1/30/2010

By arranging the start dates in order with a row_number, and unioning this result with the end dates (which this outer set also has row_number() applied), you are left with the following:

  EVENT_DATE	START_ORD	OVERALL_ORD		EVENT_TYPE
  1/1/2010	1		1			-1
  1/12/2010	2		2			-1
  1/15/2010	null		3			1
  1/30/2010	null		4			1

The event type is used to just make sure that dates that appear ont he same day will have the start_date appear in the result before the end_date on the same day.

To find an era end means that there is no period of time where there is an ‘open’ start-end period. Think of it like a parenthesis expression. This one looks like this:

(
 (
   )
    )

The second event date opens an erra that extends the end date of the first event’s extent.

So, how can the above table be used? well we see that when you have a number of STARTS that is equal to 2 * the number of EVENTS, that means that every start MUST have been closed by an end. So to find the end date of the era above, you’d look for the row where START_ORD * 2 = OVERALL_ORD. But, there are NULLS in the row that we want (the last row signals the end of an era, because that that point, all the starts have been ended). But how do we get nulls to the the MAX(START_ORD) of the prior rows? Enter ROWS UNBOUNDED PRECEDING.

Given the following result set above, by applying this window function to that query:

select PERSON_ID, EVENT_DATE, START_ORDINAL, OVERALL_ORD,  EVENT_TYPE,
	MAX(START_ORDINAL) OVER (PARTITION BY PERSON_ID ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as FULL_START_ORDINAL

We get this:

EVENT_DATE	START_ORD	OVERALL_ORD		EVENT_TYPE	FULL_START_ORD
1/1/2010	1		1			-1		1
1/12/2010	2		2			-1		2
1/15/2010	null		3			1		2
1/30/2010	null		4			1		2

And now, we can see that the last row is where 2 * FULL_START_ORD = OVERALL_ORD and gives us the END_DATE of the era. In the demo sql, you’ll see how I apply all the end dates I locate (this example only has 1 era end date) to the corresponding row, would look something like this:

START_DATE	ERA_END
1/1/2010	1/30/2010
1/12/2010	1/30/2010

And finally, finding the MIN(START_DATE) grouping by ERA_END leads to:

START_DATE	ERA_END
1/1/2010	1/30/2010

And this is the era defined by my two different exposures.

This is the simple era building logic. In the examples I give,I show how i roll up the drug exposures to the ingredient level, and then my row_number() calls are partitioned first by person then by the ingredient concept ID. Partitioning it that way will treat each individual ingredient as it’s own era fragment, generating all eras for all ingredients with one pass on the data! (well, maybe two).

Let me know if you have any other questions.

-Chris

Vojtech_Huser · June 13, 2017, 3:24pm

The conversion scripts offer parametized SQL for generating eras.
I am hoping to contribute the extracts of it to the CDM on GitHub.

@schuemie @clairblacketer

clairblacketer · June 13, 2017, 3:57pm

I would like to add a folder to the master branch with any pieces of code that don’t belong with the DDLs but are universal across CDM builds like the _ERA logic.