OHDSI Home | Forums | Wiki | Github

Populating *_era tables?


(Ryan Duryea) #1

I’m curious how everyone populates their drug_era and condition_era tables.

Seems like drug/condition/etc eras are generally populated after the data is ETL’d into an RDBMS. Does anyone have some SQL that populates those tables? Or is there a another, perhaps faster, way to populate those tables?

Thanks!


(Patrick Ryan) #2

Hi Ryan: There’s been several implementations of era builders. Originally, OMOP provided a tool called GERALD that provided SAS and Oracle SQL solutions. The code is available at: http://omop.org/cdm

I’ve done it myself in TSQL script with cursors, but found it wasn’t super fast with big data, so wouldn’t recommend that approach. Chris came up with a very clever approach to building eras in pure SQL, here’s a snippet below for drug era, assuming you’re working with OMOP CDM v4 in SQL Server and have pharmacy claims with days supply to infer end dates for each exposure record:

with cteDrugTarget (DRUG_EXPOSURE_ID, PERSON_ID, DRUG_CONCEPT_ID, DRUG_EXPOSURE_START_DATE, DRUG_EXPOSURE_END_DATE, DRUG_TYPE_CONCEPT_ID, REFILLS, QUANTITY, DAYS_SUPPLY, RowNum) as(-- Normalize DRUG_EXPOSURE_END_DATE to either the existing drug exposure end date, or add days supply, or add 1 day to the start dateselect d.DRUG_EXPOSURE_ID, d. PERSON_ID, c1.concept_id as DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE, COALESCE(d.DRUG_EXPOSURE_END_DATE, DATEADD(day,d.DAYS_SUPPLY,d.DRUG_EXPOSURE_START_DATE), DATEADD(day,1,DRUG_EXPOSURE_START_DATE)) as DRUG_EXPOSURE_END_DATE, d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY,ROW_NUMBER() OVER (PARTITION BY d.PERSON_ID, c1.concept_id ORDER BY d.DRUG_EXPOSURE_START_DATE) as RowNumFROM CDM4_SIM.dbo.DRUG_EXPOSURE dinner join CONCEPT_ANCESTOR ca on d.DRUG_CONCEPT_ID = ca.DESCENDANT_CONCEPT_IDinner join concept c1 on ca.ANCESTOR_CONCEPT_ID = c1.concept_id and c1.VOCABULARY_ID = 8 and c1.concept_level = 2),cteEndDates (PERSON_ID, END_DATE) as(select PERSON_ID, DATEADD(day,-30,EVENT_DATE) as END_DATE – unpad the end dateFROM(select PERSON_ID, DRUG_CONCEPT_ID, EVENT_DATE, EVENT_TYPE, MAX(START_ORDINAL) OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL, – this pulls the current START down from the prior rows so that the NULLs from the END DATES will contain a value we can compare with ROW_NUMBER() OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORD – this re-numbers the inner UNION so all rows are numbered ordered by the event datefrom(-- select the start dates, assigning a row number to eachSelect PERSON_ID, DRUG_CONCEPT_ID, DRUG_EXPOSURE_START_DATE AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY DRUG_EXPOSURE_START_DATE) as START_ORDINALfrom cteDrugTargetUNION ALL-- pad the end dates by 30 to allow a grace period for overlapping ranges.select PERSON_ID, DRUG_CONCEPT_ID, DATEADD(day,30,DRUG_EXPOSURE_END_DATE), -1 as EVENT_TYPE, NULLFROM cteDrugTarget) RAWDATA) EWHERE (2 * E.START_ORDINAL) - E.OVERALL_ORD = 0),cteRawData as(select d.DRUG_EXPOSURE_ID, d.PERSON_ID, d.DRUG_CONCEPT_ID,d.DRUG_EXPOSURE_START_DATE, d.DRUG_EXPOSURE_END_DATE, MIN(e.END_DATE) as ERA_END_DATE,d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY, d.RowNum,ROW_NUMBER() OVER (PARTITION BY d.PERSON_ID, MIN(e.END_DATE) ORDER BY d.PERSON_ID, d.DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE) as EraRowNumFROM cteDrugTarget dJOIN cteEndDates e on d.PERSON_ID = e.PERSON_ID and e.END_DATE >= d.DRUG_EXPOSURE_START_DATEGROUP BY d.DRUG_EXPOSURE_ID, d.PERSON_ID, d.DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE, d.DRUG_EXPOSURE_END_DATE, d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY, d.RowNum) select row_number() over (order by person_id asc) as DRUG_ERA_ID,r.PERSON_ID, r.DRUG_EXPOSURE_START_DATE as DRUG_ERA_START_DATE, r.ERA_END_DATE as DRUG_ERA_END_DATE, 38000182 as DRUG_TYPE_CONCEPT_ID,r.DRUG_CONCEPT_ID,1 as DRUG_EXPOSURE_COUNTinto CDM4_SIM.dbo.DRUG_ERA from cteRawData r where r.EraRowNum = 1 order by r.PERSON_ID, r.DRUG_CONCEPT_ID, r.DRUG_EXPOSURE_START_DATE;

Any of these approaches should work for OMOP CDMv4. For OMOP CDMv5, I’m planning to write up a SQL script approach loosely based on Chris’ implementation that I think should work well for drug_era, dose_era, and condition_era, that we can run through SqlRender to make it work for Oracle, SQL Server, and PostgresQL. I need to do a bit more testing on that first though so I don’t create too much re-work for those who’d like to implement it.

Of course, there’s lots of ways to skin this cat, I’d be quite keen to hear how others have tackled this problem.

Cheers,

Patrick


(Rae Woong Park) #3

Here is our MSSQL queries for CDM v4:


This link includes
CONDITION_ERA
DRUG_ERA
OBSERVATION_PERIOD
OMOP_ETL_QUERY

We used cursor, so it may take some time if volume of your data is big enough.
However, for our data (2.4M patients), it usually finished within 3-4 hours for each era.
The converted data can be viewed at: http://ohdsi.org/web/achilles/


(Christian Reich) #4

We also have Don’s Perl version doing the iterations, if that’s any help.

C

From: Patrick Ryan [mailto:notifications@mail132-12.atl131.mandrillapp.com] On Behalf Of Patrick Ryan

Sent: Monday, November 24, 2014 6:22 PM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Populating *_era tables?

Patrick_Ryan

November 24

Hi Ryan: There's been several implementations of era builders. Originally, OMOP provided a tool called GERALD that provided SAS and Oracle SQL solutions. The code is available at: http://omop.org/cdm

I've done it myself in TSQL script with cursors, but found it wasn't super fast with big data, so wouldn't recommend that approach. Chris came up with a very clever approach to building eras in pure SQL, here's a snippet below for drug era, assuming you're working with OMOP CDM v4 in SQL Server and have pharmacy claims with days supply to infer end dates for each exposure record:

with cteDrugTarget (DRUG_EXPOSURE_ID, PERSON_ID, DRUG_CONCEPT_ID, DRUG_EXPOSURE_START_DATE, DRUG_EXPOSURE_END_DATE, DRUG_TYPE_CONCEPT_ID, REFILLS, QUANTITY, DAYS_SUPPLY, RowNum) as(-- Normalize DRUG_EXPOSURE_END_DATE to either the existing drug exposure end date, or add days supply, or add 1 day to the start dateselect d.DRUG_EXPOSURE_ID, d. PERSON_ID, c1.concept_id as DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE, COALESCE(d.DRUG_EXPOSURE_END_DATE, DATEADD(day,d.DAYS_SUPPLY,d.DRUG_EXPOSURE_START_DATE), DATEADD(day,1,DRUG_EXPOSURE_START_DATE)) as DRUG_EXPOSURE_END_DATE, d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY,ROW_NUMBER() OVER (PARTITION BY d.PERSON_ID, c1.concept_id ORDER BY d.DRUG_EXPOSURE_START_DATE) as RowNumFROM CDM4_SIM.dbo.DRUG_EXPOSURE dinner join CONCEPT_ANCESTOR ca on d.DRUG_CONCEPT_ID = ca.DESCENDANT_CONCEPT_IDinner join concept c1 on ca.ANCESTOR_CONCEPT_ID = c1.concept_id and c1.VOCABULARY_ID = 8 and c1.concept_level = 2),cteEndDates (PERSON_ID, END_DATE) as(select PERSON_ID, DATEADD(day,-30,EVENT_DATE) as END_DATE -- unpad the end dateFROM(select PERSON_ID, DRUG_CONCEPT_ID, EVENT_DATE, EVENT_TYPE, MAX(START_ORDINAL) OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL, -- this pulls the current START down from the prior rows so that the NULLs from the END DATES will contain a value we can compare with ROW_NUMBER() OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORD -- this re-numbers the inner UNION so all rows are numbered ordered by the event datefrom(-- select the start dates, assigning a row number to eachSelect PERSON_ID, DRUG_CONCEPT_ID, DRUG_EXPOSURE_START_DATE AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (PARTITION BY PERSON_ID, DRUG_CONCEPT_ID ORDER BY DRUG_EXPOSURE_START_DATE) as START_ORDINALfrom cteDrugTargetUNION ALL-- pad the end dates by 30 to allow a grace period f or overlapping ranges.select PERSON_ID, DRUG_CONCEPT_ID, DATEADD(day,30,DRUG_EXPOSURE_END_DATE), -1 as EVENT_TYPE, NULLFROM cteDrugTarget) RAWDATA) EWHERE (2 * E.START_ORDINAL) - E.OVERALL_ORD = 0),cteRawData as(select d.DRUG_EXPOSURE_ID, d.PERSON_ID, d.DRUG_CONCEPT_ID,d.DRUG_EXPOSURE_START_DATE, d.DRUG_EXPOSURE_END_DATE, MIN(e.END_DATE) as ERA_END_DATE,d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY, d.RowNum,ROW_NUMBER() OVER (PARTITION BY d.PERSON_ID, MIN(e.END_DATE) ORDER BY d.PERSON_ID, d.DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE) as EraRowNumFROM cteDrugTarget dJOIN cteEndDates e on d.PERSON_ID = e.PERSON_ID and e.END_DATE >= d.DRUG_EXPOSURE_START_DATEGROUP BY d.DRUG_EXPOSURE_ID, d.PERSON_ID, d.DRUG_CONCEPT_ID, d.DRUG_EXPOSURE_START_DATE, d.DRUG_EXPOSURE_END_DATE, d.DRUG_TYPE_CONCEPT_ID, d.REFILLS, d.QUANTITY, d.DAYS_SUPPLY, d.RowNum) select row_number() over (order by person_id asc) as DRUG_ERA_ID,r.PERSON_ID, r.DRUG_EXPOSURE_START_DATE as DRUG_ERA_START_DAT E, r.ERA_END_DATE as DRUG_ERA_END_DATE, 38000182 as DRUG_TYPE_CONCEPT_ID,r.DRUG_CONCEPT_ID,1 as DRUG_EXPOSURE_COUNTinto CDM4_SIM.dbo.DRUG_ERA from cteRawData r where r.EraRowNum = 1 order by r.PERSON_ID, r.DRUG_CONCEPT_ID, r.DRUG_EXPOSURE_START_DATE;

Any of these approaches should work for OMOP CDMv4. For OMOP CDMv5, I'm planning to write up a SQL script approach loosely based on Chris' implementation that I think should work well for drug_era, dose_era, and condition_era, that we can run through SqlRender to make it work for Oracle, SQL Server, and PostgresQL. I need to do a bit more testing on that first though so I don't create too much re-work for those who'd like to implement it.

Of course, there's lots of ways to skin this cat, I'd be quite keen to hear how others have tackled this problem.

Cheers,

Patrick


To respond, reply to this email or visit http://forums.ohdsi.org/t/populating--era-tables/78/2 in your browser.

To unsubscribe from these emails, visit your user preferences.


(Lisa Schilling) #5

Ryan,
Did you ever implement any of these ERA generation script for CDMv4? Given the prior posts regarding using a Cursor fx (which is slow) and the method Chris used, I’m wondering if you extended Chris’ methods to condition-era. We’re having problems with GERALD code-- getting to to work with PostgreSQL. (I guess the same goes for v5- where actually condition and drug era are the same computations…) thanks, Lisa


(Chris Knoll) #6

Did you try this script to build Drug Eras for CDM v4?


(Lisa Schilling) #7

Not yet, our plan is try to get the condition_era script working first (we have the required fields and are missing days supplied in our drug-exp data. But reading the Forum discussions- I had a sense that the script you wrote may be more efficient than then one using a Cursor fx. Did you create an analogous script for condition-era? If not, we’re planning to look at this (yours) more closely and see if we can use it as a model for condition-era and also use Rae’s ( I think he posted the other version) and see how we do regarding run time. We’re using postgres. thanks!

Lisa Schilling, MD, MSPH
Associate Professor of Medicine
Division of General Internal Medicine
University of Colorado, School of Medicine

office AO1-Room 8219: 303-724-2254
office UPI Building- ACCORDS: 303-724-5138
fax AO1: 303-724-2270

Mailing Address:
Division of General Internal Medicine
University of Colorado School of Medicine
8th Floor, Academic Office 1-Office 8219
Mailstop B180
12631 E. 17th Ave


(Chris Knoll) #8

You can use the gist linked below. Unlike drug era (where exposures are rolled up to their ingredients), condition occurrence just creates eras from the same condition_concept_id. You can tweak the script to allow for larger gaps (this one is set to 30 days).

-Chris


(Lisa Schilling) #9

Great, thanks!

Lisa Schilling, MD, MSPH
Associate Professor of Medicine
Division of General Internal Medicine
University of Colorado, School of Medicine

office AO1-Room 8219: 303-724-2254
office UPI Building- ACCORDS: 303-724-5138
fax AO1: 303-724-2270

Mailing Address:
Division of General Internal Medicine
University of Colorado School of Medicine
8th Floor, Academic Office 1-Office 8219
Mailstop B180
12631 E. 17th Ave


(Christian Reich) #10

Chris:

Could you be so kind and put that into the OHDSI github. And, while you are at it, make it V5? :smile:

Thanks so much.


(Taylor Delehanty) #11

Here is a v5 Postgres Translation of Chris’s code to populate the condition_era table
We’ve run some test data through it and the results are as we want them.


(Vojtech Huser) #12

(reviving an old topic)

The scripts here: https://gist.github.com/taylordelehanty are for postgres only.

@Patrick_Ryan planed to do MS SQL version (even renderable).

For OMOP CDMv5, I’m planning to write up a SQL script approach loosely based on Chris’ implementation that I think should work well for drug_era, dose_era, and condition_era, that we can run through SqlRender to make it work for Oracle, SQL Server, and PostgresQL.

For a SQL Server site, is there a v5 code (somewhere on github) we could use to build drug_era?


(Anthony Sena) #13

Hi @Vojtech_Huser, I think the V4 to V5 conversion scripts may hold some answers here. In this script, we have logic to build the drug_era table using the approach that Chris & Patrick mentioned in this thread. The Github link I referenced contains the OHDSI-SQL version which you can run through SqlRender. It also contains a subfolder for each RDBMS version that contains the conversion script converted by SqlRender for that particular dialect.

The drug_era logic starts on line 1923 and ends on 2059 in the OHDSI-SQL version.

Hope this helps!


(Vojtech Huser) #14

Thank you for posting the code.
Just to document my steps, below is the code I used.
I wish the initial sql would also use @vocabschema argument. Since we keep it in a different schema.

Perhaps it should also be added to this schema upcoming proposal http://www.ohdsi.org/web/wiki/doku.php?id=documentation:next_cdm:schema_revisions

library(SqlRender);library(RCurl)
url<-'https://raw.githubusercontent.com/OHDSI/CommonDataModel/master/Version4%20To%20Version5%20Conversion/OMOP%20CDMv4%20to%20CDMv5%20-%20OHDSI-SQL.sql'
sql<-getURL(url)
#supply your parameters
tta<-SqlRender::renderSql(sql,results_database_schema='OHDSI.dbo')
ttb<-SqlRender::translateSql(tta$sql,targetDialect = 'postgresql')
#final sql gets written here
cat(ttb$sql,file='c:/temp/drug_era.sql')

(Selva Muthu Kumaran Sathappan) #15

Hello Everyone,

Apologies if my question is basic. We have created all standard clinical tables and it’s in CDM format. However for era tables, I guess it should be automatically generated. Am I right to understand that when we run the specific script/query for era tables, it automatically fetches the required data from Drug_exposure (for drug_era), Drug_exposure (for dose_era) and condition_occurrence (condition_era). Should any of these era tables be empty, then the issue is only with the source CDM tables or can there be any other issue as to why the era tables are empty?

Thanks
Selva


t