OHDSI Home | Forums | Wiki | Github

OHDSI Study #1: Treatment Pathways in Chronic Disease

A new study has been posted to the OHDSI Research Network:

Treatment Pathways in Chronic Disease

Objective: The objective of this study is to characterize the prevalence of different treatment pathways for three chronic diseases: Hypertension, Type II Diabetes, and Depression. We will systematically summarize the treatment pathways observed among patients who have at least 3 years of continuous observation and persistent treatment following initiation. We will stratify the results by year to evaluate temporal trends, and will further stratify by data source to determine if treatment pathways vary by population, geography, and data capture process.

Rationale: While numerous treatment guidelines exist for chronic conditions, there is a paucity of data on the real-world treatment pathways that patients experience in practice. Understanding these pathways is essential for establishing context around questions of drug utilization, effectiveness, and adherence.

Project Leads: Patrick Ryan, Jon Duke, George Hripcsak, Martijn Schuemie, Nigam Shah

Coordinating Institution(s): Janssen R&D, Columbia University, Regenstrief Institute, Stanford University

See full details on the wiki.

We’ve added the full R script to run the treatment pathway analysis against your data. It’s now available in the StudyProtocols github repository. If you can run R, this script will do all the work for you, simply edit the parameters to your environment, and it’ll create custom SQL, execute against your database, export the results. This script will do all 3 diseases we are starting with for this OHDSI community analysis.

So, to recap:

Protocol is available on OHDSI wiki at: http://www.ohdsi.org/web/wiki/doku.php?id=research:treatment_pathways_in_hypertension.

The SQL code in this directory was rendered using the Treatment Pathway R script available in the Treatment_Pathways folder within this StudyProtocols github repository.

You have two options to execute this analysis:

  1. Working within R:
  • Open MainAnalysis.r, within Treatment_Pathways subfolder.

  • Modify the parameters near the top of the script
    folder = “C:/Users/mschuemi/Desktop/Treatment patterns” # Folder containing the R and SQL files
    minCellCount = 5 # all cell counts lower than this value will be removed from the final results table
    cdmSchema = “cdm_truven_ccae_6k” # schema name where your patient-level data in OMOP CDM format resides
    resultsSchema = “scratch” # schema where you’d like the results tables to be created (requires user to have create/write access)
    sourceName = “CCAE_6k” # short name that will be appeneded to results table name
    dbms = “postgresql” # Should be “sql server”, “oracle”, “postgresql” or “redshift”

  • Execute the script.
    MainAnalysis.r will render the SQL, translate it to your environment dialect (SQL Server, Oracle, PostgresQL), execute the SQL, and export the resulting summary statistics as .csv files. As written, this script will complete the analysis for all 3 study requests: hypertension, type 2 diabetes mellitus, and depression.

  • Email the results files to study coordinator.

  1. Working with SQL:
  • Open the dialect-specific version of the SQL in your SQL developer console.
  • Find/replace the default values for any parameters that require modification.
  • Export the 4 results tables from your resultSchema into .csv files.
  • Email the results files to the study coordinator.
  • Repeat process for each of the 3 study requests: hypertension, type 2 diabetes mellitus, depression.

We had ran the code on our AUSOM dat, and found that:

For T2DM,

Records for ANCESTOR_CONCEPT_ID in (21600712,21500148) is 2,529,988
The count of person “WHERE de.RowNumber = 1” was 44,963
And who meet the observation period criteria “WHERE DATEADD(dd,365, op.OBSERVATION_PERIOD_START_DATE) <= dt.DRUG_EXPOSURE_START_DATE AND DATEADD(dd,1095, dt.DRUG_EXPOSURE_START_DATE) <= op.OBSERVATION_PERIOD_END_DATE” was only 18!!!

For hypertension, they were 6,852,827; 246,591; and 22 in sequence.
For depression, they were 2,083,817; 80,429 and 20 in sequence.

As our AUSOM data is acute hospital data rather than claim, the criteria of 1 year of baseline period and 3 year of observation period criteria seems to strict.

Rae

Hi Rae,

I also found the numbers to be lower than I’d expected, and yes it is clearly due to the 1 year baseline plus 3 years of continuous observation. From a clinical perspective, we thought it would be of value to evaluate only those with an extended period of treatment for the condition. But practically it seems to be limiting the results (or even the option to run) for several sites.

I think it would be reasonable to discuss reducing the 3 years to 18 months (while keeping the 1 years baseline period). Reason being, a lot of failure to reach treatment goal happens in the early phases of treatment, so we could pick up a lot more patients without a signficant loss of clinical value. I realize we should have worked some of this through during a protocol development phase, but we are learning a lot from this first study!

Patrick et all, thoughts?

Jon

@rwpark, it’s related to this study, but may be a topic for a more general discussion: could you describe how and why you came up with your definition of your OBSERVATION_PERIOD records in your database? I wonder whether the convention you’ve followed is similar or different from other hospital-based datasets in the OHDSI network, including those who may be participating in this study? I know some EHR datasets that are using OMOP CDM are defining OBSERVATION_PERIOD as the duration from first observation (condition, drug, procedure and/or lab) to last observation, but your ACHILLES results you shared seemed to suggest that you have defined the observation period as something different, like the period from admission to discharge. both assumptions could be reasonable, but clearly would result in different findings.

1 Like

This is a good question in general, worthy of more detailed review. For our part, we have mixed inpatient and outpatient data, so we might have a different approach than if we had just inpatient. When I get a chance, I need to review Jon’s approach (we had already started the conversation; I just need some time). George

(my comment is more about all three analyses, so I started a new topic)

I am very excited about this first analysis and some responses it generated so far. To contribute, I would like to report my experience with IMEDS lab server and a small bug which others may run into as well.

IMEDS RedShift server took a total of 9.5 minutes to execute all three analyses! That was quite fast!
I would be curious to find out how long it took for Oracle and MS-SQL-Server at J&J, and Regenstrief…

For those that choose to do the R version:
There is a small bug in the result extraction helper function. (extractAndWriteToFile).
The table name does not have TxPath in it (at least on RedShift) and the studyName must be before source name. The non-commented line below has the correct code. (commented out is original code)

#parameterizedSql<- “SELECT * FROM @resultsSchema.dbo.TxPath_@sourceName_@studyName_@tableName”

parameterizedSql<- "SELECT * FROM @resultsSchema.dbo.@studyName_@sourceName_@tableName"

Also a minor bug to improving the output file name is to add ,sep=’’ to the
line generating outputFile

outputFile <- paste("TxPath_",sourceName,"_",studyName,"_",tableName,".csv",sep='')

I also have RedShift translated SQL code for all three analyses - so feel free to email me for that. (or maybe I will be able to post it somewhere)

Great stuff vojtech! Thanks for digging in. I can report that at jnj, it took 1 hr to run htn on ccae, and about 30min for t2dm and depression. All other databases are smaller and ran faster. But you currently are the leader in the clubhouse with a 9 min full analysis suite...and you did it in 3 days from release....this should serve as strong example for our entire community that large scale analysis and real world evidence generation is directly within our grasps when we all work together.

Vojtech,

Excellent to hear! Blazing speed, fantastic. Can you provide the size of your result sets? (i.e., how many patients from the first table?)

Jon

Doesn't requiring 3 yr introduce some baud such that you are only analyzing compliant survivors? And very compliant if I recall from the overview on Tues? Any sensitivity analyses planned or is this just a kickoff and pilot for collaborating on a study?

Just curious

Thanks

Cindy

These are fair questions and its good to think about them and have a public discussion about their merits. I’m excited that we now have this forum to engage in these discussions. While this is our first kick-off analysis for the network, it is not purely an pilot, I believe the results generated from this analysis are addressing an important clinical question for which their isn’t currently evidence available, so will be highly publishable and directly relevant for current clinical practice.

To your questions, I don’t think we’re introducing bias in this descriptive summary analysis, the 3yr requirement is part of the definition of the problem we’re seeking to address, which is ‘in real world populations, what is the long-term treatment patterns among these chronic diseases?’. Our definition requires that a person has at least one exposure record every 4 months during the 3-yr period of interest. Changing the 3-yr period fundamentally changes the question : an equally reasonable, but different question, that could be asked is: ‘what are the short-term treatment patterns observed in the 1-yr period after treatment initiation for chronic diseases’. Given @rwpark 's insightful observation about the quick drop-off of eligible patients, perhaps an additional analysis with only a 1yr window would be a nice complement to the current study? If there’s interest from someone, you could add this thought to the ‘protocols in development’ for reaction. Modifying the code to run for only 1yr would be trivial to do.

I don’t believe our requirement of ‘>=1 exposure every 4mo’ makes the population ‘very compliant’, because this definition would allow a person with one 30d dispensing each 4 months (and therefore, 90d of non-adherence, or proportion of days covered [PDC]=25%) to be considered in the analysis. We chose 4 months because it could be that a person gets a 90d dispensing, and we’d be tolerating a 30d of non-adherence (so, PDC=75%). The reason we need to impose some requirement of regular exposure is because otherwise we would incorrectly classify a person as persisting on their last treatment, when in fact they have may stopped treatment prior to the end of observation. As an extreme thought experiment, if we had a person with only one 30-d exposure to metformin, and no subsequent treatment during their 3-yr observation after start…we’d classify that person’s sequence as ‘metformin only’, when in reality we’d say the person was either unexposed during the 35 months or we’d guess that we don’t have confidence in the data during that period (or maybe even question the first exposure and whether the person really have T2DM!).

Re: sensitivity analysis, I hear there’s a very good paper on this topic (http://www.ncbi.nlm.nih.gov/pubmed/24969153) :slight_smile: In this descriptive analysis, where we aren’t estimating a causal effect and aren’t estimating prevalence of disease or incidence of treatment, I think there’s less opportunities for systematic error, but it still bears consideration. Our current planned sensitivity analysis involve our stratification by database (to see if populations, geographies, and/or data capture processes influence the observed patterns) and by year (to see if the patterns are influenced by any secular trends). I do not believe we can change the 3-yr window without changing the underlying research question. It may be interested to some in the community to explore whether modifying the 4-month window to be less or more materially impacts the results, though again I would caution that extreme changes in this ‘parameter’ would change the question. I suppose additional sensitivity analyses could be performed to explore the impact of different definitions of the indication disease (e.g. right now T2DM is defined by diagnosis code, as @rkboyce pointed out, it could also be defined by other elements), and if others in the community want to implement alternative approaches, I think that’d be great and I’d be happy to try to run them on my data. I don’t think the drug list used for each disease is too controversial, but certainly a benefit of an open community is to get as many eyes on it as possible to improve the quality of our research.

Thanks Patrick as always for the very thoughtful response which I completely agree with - I think the eligibility definitely depends on the question of interest. Here you are interested in describing treatment patterns for patients who 'fairly often' fill their prescriptions and who contine not only their prescriptions but survive for 3years in these chronic conditions. This is obviously a very valid question. Another valid question may be to describe treatment patterns in patients initiating therapy for these conditions 'over' 3years, including noncompliance, switching, augmentation, discontinuation and death (and when these various things on average occur). It simply depends on the question of most interest but I might say that both are of interest.... However for purposes of this first protocol, simpler initial analyses may be of most interest. Thanks for the good chat over the research question!

Cindy

@Vojtech_Huser Thanks for the heads up re: the bug you encountered. It was not from Patrick/Martijn’s original code but from an adjustment I made to the table names in the parameterized SQL but failed to carry over to the HelperFunctions.R file. Your fix is absolutely correct and I have updated on GitHub.

Given the fact that most folks are in fact running the Treatment Pathways Study on all three conditions (HTN, Diabetes, and Depression) and that a related and important discussion has popped up (kicked off by @Vojtech_Huser and joined by @cgirman and @Patrick_Ryan), I have merged these two threads and updated the title here to reflect the broader scope.

Ongoing discussion of the Treatment Pathways Study should continue within this thread.

Thanks,

Jon

I agree fully with Patrick’s comments regarding the 3-year study. I am also interested however in a separate analysis exploring a shorter post-initiation duration (12 months), because I believe a lot of switching occurs early on and this will provide additional insights.

Given @rwpark and @cgirman comments, I went ahead and created a new Proposed Protocol that we can iterate on if there is interest. I will create a new Topic for discussion around this protocol, which I have called Early Treatment Pathways in Chronic Disease.

Jon

Good idea Jon. Count me in as a collaborator on ‘Early Treatment Pathways’, I’ll be happy to contribute both as a data partner (I’ll run analysis against Truven, Optum, CPRD - pending ISAC approval) and I’ll also be happy to help with the coordination of the results and using the same visualizations as we developed for the long-term study. I think with only a little bit of work, I can further parameterize the R script that we put together for the original study to allow for defining the overall time period of interest and the interval length to look for exposure records. I still think the 4-mo window per exposure is a good starting point, so in a 1-yr study, that’d simply be a requirement of 3 exposure records (you’ve got your index drug, then >=1 drug from 4mo-8mo, then >=1 drug from 8mo-12mo). It’ll be fascinating to see how much of the heterogeneity in the long-term treatment sequences is already present in the short-term period.

The AUSOM data includes both inpatient and outpatient data. I will check it out in detail, and reply later.

We found a cause of the problem. Our data include both inpatient and outpatient data. Thus that is not a cause.

When we calulate the observation period, the next visit within 30 days is required for continuation of the observation period. However many of our patients visit after 30 days from the previous one. 60 days or 90 days of supply are frequrnt in our data. Thus, even though many of patients had visited our hospital for a long time, the calculated obsevation periods are usually much shorter than expected.
Does the 30 days criteria for continuation of observation different at other sites? When we calculate the observation period, do we need to include the period of ‘days of supply (dispensing)’ as continued period of observation?
Rae

Good find Rae! We actually use a 36 month window on observation period, so people do not start a new observation period until they have been “silent” for 36 months. Here is a bit about our determination process:

We spent a lot of time thinking about gaps and how to define observation periods. Here is what we ended up doing, for better or for worse.

We calculated the distribution of time periods in which patients “went dark” for >1 year then reappeared in the HIE. The reason for this exercise was that I simply (stubbornly?) could not accept that an observation period should conclude on the date of the last data point. This approach seemed problematic for pharmacovigilance, particularly where the last piece of data was a drug dispense event. Tacking on 30 days to catch immediate adverse events seems reasonable. But we felt that, particularly for young people, being absent from the system for a couple years should not imply non-observability. So anyway, looking at 1.5 million people with gaps > 1 yr, here is what we found:

Percentile Mean Gap (yrs)

100% Max 8.97467

99% 7.29363

95% 5.90828

90% 5.00479

75% Q3 3.56468

50% Median 2.31896

25% Q1 1.53867

10% 1.16632

5% 1.06776

1% 1.01300

0% Min 1.00205

So it basically says that of people who went dark for > 1 yr and then reappeared at some point, the longest gap was 9 years, median gap was 2.3 yrs.

To be conservative, we ended up going with a 36-month bridge. Any gaps > 36-months result in a new observation period. Otherwise, we add 36 months to the last event to create the observation_period end_date (censoring at our CDM end date 12/31/2013 or at the time of death). This is obviously quite conservative, but as a statewide HIE we expect to catch people unless they pack up and move, so we did not want to declare people unobservable unless we were reasonably confident. (Fortunately, the nice thing about the CDM is that if this conservative approach proves problematic, we can easily revise the observation_period end_dates without much heavy lifting).

Jon

t