Oncology data use cases

Brian_Furner · May 8, 2018, 4:41pm

Thanks

rimma · May 13, 2018, 4:29am

@Brian_Furner , @aumesh ,

Welcome to the group! Please send me your email addresses so that I can include you in the mailing list.

The workgroup page is here: http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:oncology-sg#

Brian_Furner · May 13, 2018, 4:59am

Please include my email (bfurner@bsd.uchicago.edu mailto:bfurner@bsd.uchicago.edu )

Thanks,

Brian

rimma · May 13, 2018, 3:03pm

Done!

aumesh · May 13, 2018, 7:16pm

Please use anita.umesh@gmail.com for my email address.

Thank you, Rimma!

My very best,
Anita

rimma · May 14, 2018, 2:22pm

Done! You should have received the first group email.

jliddil1 · February 28, 2019, 8:13pm

Please add me to the group. Totally newbie to OMOP but I have done clinical research for years. My past experience is with CDISC

mgurley · March 1, 2019, 12:01am

We will soon resume the Oncology Workgroup meetings. With a focus on working on issues surrounding ETLing data into the new proposed structures in the OHDSI Oncology CDM Extension Proposal. See here:

We have been working on vocabulary support for the extension:

jliddil1 · March 1, 2019, 1:26pm

Thanks. Looking forward to helping out

mgurley · June 26, 2019, 10:49am

I am re-upping this forum thread. Yesterday’s Oncology Workgroup meeting raised the need to collect concrete oncology analytic use cases for the beta testing of the OMOP CDM Oncology extension. See meeting notes here:

https://www.ohdsi.org/web/wiki/doku.php?id=documentation:oncology:meeting_notes_2019_jun-25

Please post use cases here.

jliddil1 · June 26, 2019, 1:25pm

What do you mean by use cases?

mgurley · June 26, 2019, 1:43pm

Articulated, specific definitions of patient cohorts and event cohorts. Example:

Patients with histology of ‘Anaplastic astrocytoma’ WHO Grade 3 that test ‘present’ for IDH1 and IDH2 mutations in any primary CNS anatomical location treated by surgical resection followed by External beam radiotherapy with a total dose of at least 35 Gy that did not have a progresssion or recurrence for 365 days after surgical resection.

dckc · June 27, 2019, 6:49pm

Here’s one from the Greater Plains Collaborative Share Thoughts on Breast Cancer Study:

The Share Thoughts on Breast Cancer study began in May 2015 and is recruiting 1,300 women aged 18 and over who were first diagnosed at one of the participating medical centers with ductal carcinoma in situ or stage I-III breast cancer during January 1, 2013 to May 1, 2014.

The cohort definition was:

Inclusion criteria for the UNDERLYING de-identified study population:

Any sex

Diagnosed with primary breast cancer

Age 18+ at the time of dx

diagnosed during 7/1/2012 - 6/30/2013 (i.e. 18-30 months prior to survey)

(if there are insufficient patients diagnosed in this period, we may extend the window)

Also, as the timeline for survey implementation slips we will shift the diagnosis window accordingly

Exclude from the SURVEY sample if:

Sex not equal to female

Less than 18 years of age

Prior cancer diagnosis

Breast cancer was not microscopically confirmed

Only tumor morphology was lobular carcinoma in situ

Stage IV breast cancer

Known to be deceased

Non-English speaking (for now)

In this project, each of the 8 sites integrated their tumor registry NAACCR file into i2b2 (NAACCR_ETL in the GPC wiki); we collected about 50 variables ( bc-variable.csv) and used R markdown and python to do QA (bc_qa) and load the data into REDCap. See also BreastCancerDataSharing in the GPC wiki.

Using a similar approach, the GPC CancerRCR project defines 3 cohorts; for example:

Query 2: PCORnet Modular Program request (Newly Identified Cancer Patients with Evidence of Genetic Test)

Age Restriction: > 21 years of age as of September 30, 2016

Query Period: October 1, 2015 – September 30, 2016

Health Event of Interest (Index) Groups: Lung, Colorectal, Breast, Prostate, Pancreatic or Esophageal Cancer DX

Inclusion/Exclusion Criteria 1: Exclusion of the above cancers 10 years before the diagnosis of interest

Inclusion/Exclusion Criteria 2: Include procedure for common molecular or genetic tests for cancers of interest

Stratification: Age group, sex, race, ethnicity

The GPC’s NAACCR_ETL approach is outdated by v18 (GPC issue #739) and the PCORNet CRG on Cancer is considering a PCORNet CDM tumor table. In considering new approaches, I discovered this OHDSI Oncology WG.

The current thinking is that the PCORNet tumor table would look a lot like the NAACCR file, whereas I gather this OHDSI WG aims to do a pretty deep integration of NAACCR data into the OMOP CDM; something that wouldn’t easily be reversible, for example. But I see you’re also scraping vocabulary data out of NAACCR specs and I suspect there’s some effort in that area that we could share.

I see one or two active groups in HL7/FHIR as well. I aim to expand on that a bit in GPC issue #739.

SCYou · June 27, 2019, 10:56pm

Today, directly-opposed results from RCTs about the duration of adjuvant trastuzumab for breast cancer were published in the Lancet.
The PERSEPHONE study concluded that 6-month adjuvant trastuzumab was non-inferior to 12-month treatment in patients with HER2-positive early breast cancer with less cardiotoxicity and fewer severe adverse events with median follow-up duration of 5.4 years.
In contrast, the PHARE study concluded that adjuvant trastuzumab standard duration should remain 12 months in these patients with the median follow-up duration of 7.5 years.
It’s worthy to read the editorial comment for these two articles.

We can compare the true long-term effect (survival, cardiotoxicity, and real cardiac event) of 6 month vs. 12 month trastuzumab therapy in the patients with HER-2 positive early breast cancer in the real world.

Christian_Reich · June 28, 2019, 6:04am

Friends.

We should start with simple characterizations.

Here is one:

Patients with bladder cancer ± metastases. Treatments. Disease free/even free/overall survival. Other outcomes.

Jeremy_Warner · June 28, 2019, 7:50pm

This is interesting but as the editorial says, the reason for the “directly-opposed results” was a different choice of non-inferiority margin; the actual HR + 95% CI was extremely similar for both studies! You have a problem looking at this particular use case in the real world, which is that you will have a lot of difficulty differentiating a planned 6-month course from a premature discontinuation of a 12-month course (usually due to toxicity or recurrence during the treatment period). So it would be no surprise to find that the 6ish month group would have more toxicity and worse outcomes in the real world. Same issue with 3-month vs. 6-month CapeOx or mFOLFOX6 for adjuvant treatment of stage III colon cancer.

SCYou · June 29, 2019, 2:45am

Thank you for the invaluable comment, @Jeremy_Warner. I totally agree with you in this case.

I think we’re discussing which element in oncology data should be standardized and how to.

So, even though it’s not worthy or reliable, let me think about what kind of information elements we need if we conduct a real-world replica study of PERSEPHONE and PHARE (sorry, another good use case doesn’t come up with me now).

Then, we need information including:

For baseline characteristics
- staging for breast cancer
- concomitant chemotherapy
- prior surgery or radiotherapy
- other basic demographics (age, race, BMI, …)
- Performance status (such as ECOG)
- positiveness of ER/PR/HER2
For longitudinal characteristics
- treatment pattern after baseline
For outcome
- progression (for progression-free survival)
- recurrence
- death
- cardiotoxicity outcome (such as incidence of heart failure)
- other adverse events such as uterine cancer

Christian_Reich · June 29, 2019, 3:45am

We should have all of them standardized except ECOG. But that is an inclusion criterion usually for making studies feasible. if the patient is too bad in a state they cannot follow the protocol. In Real World study you want those explicitly.

NP

All nicely standardized, but progression and recurrence might require abstraction. We don’t have defined those algorithms, yet.

mgurley · June 30, 2019, 2:03pm

@dckc Looks like we have a shared interest in developing code to prepare NAACCR to be ingested into common data models. Would you be open to having a meeting with us to discuss ingesting NAACCR into common data models? Maybe we could help each other.

The current thinking for OMOP is to not model any tables based on the NAACCR file format/data dictionary. Instead, we are ingesting the NAACCR file format/data dictionary into the OMOP vocabulary tables by assigning NAACCR items and NAACCR item codes to OMOP domains. OMOP domain assignment will direct where NAACCR data should land within the OMOP CDM. Upon first ingestion, many NAACCR concepts will be marked as ‘standard’. Our goal is to eventually make NAACCR an OMOP non-standard source vocabulary that maps to other standard vocabularies like SNOMED or LOINC. This will make the OMOP vocabulary not U.S.-centric.

Similar to what it looks like you have done, we are ingesting an ‘edited’ subset of the NAACCR “ontology” (). Putting our effort on oncology diagnosis, oncology diagnosis modifiers, oncology treatments and oncology outcomes. We are only focusing effort on NAACCR sections: ‘Cancer Identification’, ‘Stage/Prognostic Factors’, ‘Treatment-1st Course’ and ‘Follow-up/Recurrence/Death’

Regarding the SEER API, you might be interested to know that the SEER API only supports UICC Version 7 staging edition. Not version 8. See here:

github.com/imsweb/staging-client-java

Will this library and the SEER API support UICC 8 edition of TNM?

opened 12:11PM - 20 Jun 19 UTC

closed 05:13PM - 16 Jul 19 UTC

mgurley

If I am correct this library and SEER API currently output UNICC 7 edition TNM, …correct? For the NAACCR variables: TNM CLIN T NAACCR 940 TNM CLIN N NAACCR 950 TNM CLIN M NAACCR 960 TNM PATH T NAACCR 880 TNM PATH N NAACCR 890 TNM PATH M NAACCR 900 Will this library and the SEER API support the UICC 8 edition of TNM? If so, will it be bind the 8th edition only to the following variables: AJCC TNM CLIN T NAACCR 1001 AJCC TNM CLIN N NAACCR 1002 AJCC TNM CLIN M NAACCR 1003 AJCC TNM PATH T NAACCR 1011 AJCC TNM PATH N NAACCR 1012 AJCC TNM PATH M NAACCR 1013

mgurley · June 30, 2019, 10:55am

We have had good luck extracting ‘performance status’ declarations from clinic encounter progress notes with rules-based NLP. Also, we have seen ‘performance status’ declarations routinely captured in flowsheets and smart forms in some clinics. A practice that should be encouraged as much as possible. Discretely captured/extracted ‘performance status’, ‘recurrence’, and ‘progression’ declarations would make @SCYou use case closer to feasible.