OHDSI Home | Forums | Wiki | Github

Oncology data use cases

Thanks

@Brian_Furner , @aumesh ,

Welcome to the group! Please send me your email addresses so that I can include you in the mailing list.

The workgroup page is here: http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:oncology-sg#

Please include my email (bfurner@bsd.uchicago.edu mailto:bfurner@bsd.uchicago.edu )

Thanks,

Brian

Done!

Please use anita.umesh@gmail.com for my email address.

Thank you, Rimma!

My very best,
Anita

Done! You should have received the first group email.

Please add me to the group. Totally newbie to OMOP but I have done clinical research for years. My past experience is with CDISC

We will soon resume the Oncology Workgroup meetings. With a focus on working on issues surrounding ETLing data into the new proposed structures in the OHDSI Oncology CDM Extension Proposal. See here:

We have been working on vocabulary support for the extension:

Thanks. Looking forward to helping out

I am re-upping this forum thread. Yesterday’s Oncology Workgroup meeting raised the need to collect concrete oncology analytic use cases for the beta testing of the OMOP CDM Oncology extension. See meeting notes here:

https://www.ohdsi.org/web/wiki/doku.php?id=documentation:oncology:meeting_notes_2019_jun-25

Please post use cases here.

What do you mean by use cases?

Articulated, specific definitions of patient cohorts and event cohorts. Example:

Patients with histology of ‘Anaplastic astrocytoma’ WHO Grade 3 that test ‘present’ for IDH1 and IDH2 mutations in any primary CNS anatomical location treated by surgical resection followed by External beam radiotherapy with a total dose of at least 35 Gy that did not have a progresssion or recurrence for 365 days after surgical resection.

Here’s one from the Greater Plains Collaborative Share Thoughts on Breast Cancer Study:

The Share Thoughts on Breast Cancer study began in May 2015 and is recruiting 1,300 women aged 18 and over who were first diagnosed at one of the participating medical centers with ductal carcinoma in situ or stage I-III breast cancer during January 1, 2013 to May 1, 2014.

The cohort definition was:

  • Inclusion criteria for the UNDERLYING de-identified study population:
    • Any sex
    • Diagnosed with primary breast cancer
    • Age 18+ at the time of dx
    • diagnosed during 7/1/2012 - 6/30/2013 (i.e. 18-30 months prior to survey)
      • (if there are insufficient patients diagnosed in this period, we may extend the window)
      • Also, as the timeline for survey implementation slips we will shift the diagnosis window accordingly
  • Exclude from the SURVEY sample if:
    • Sex not equal to female
    • Less than 18 years of age
    • Prior cancer diagnosis
    • Breast cancer was not microscopically confirmed
    • Only tumor morphology was lobular carcinoma in situ
    • Stage IV breast cancer
    • Known to be deceased
    • Non-English speaking (for now)

In this project, each of the 8 sites integrated their tumor registry NAACCR file into i2b2 (NAACCR_ETL in the GPC wiki); we collected about 50 variables ( bc-variable.csv) and used R markdown and python to do QA (bc_qa) and load the data into REDCap. See also BreastCancerDataSharing in the GPC wiki.

Using a similar approach, the GPC CancerRCR project defines 3 cohorts; for example:

Query 2: PCORnet Modular Program request (Newly Identified Cancer Patients with Evidence of Genetic Test)

  • Age Restriction: > 21 years of age as of September 30, 2016
  • Query Period: October 1, 2015 – September 30, 2016
  • Health Event of Interest (Index) Groups: Lung, Colorectal, Breast, Prostate, Pancreatic or Esophageal Cancer DX
  • Inclusion/Exclusion Criteria 1: Exclusion of the above cancers 10 years before the diagnosis of interest
  • Inclusion/Exclusion Criteria 2: Include procedure for common molecular or genetic tests for cancers of interest
  • Stratification: Age group, sex, race, ethnicity

The GPC’s NAACCR_ETL approach is outdated by v18 (GPC issue #739) and the PCORNet CRG on Cancer is considering a PCORNet CDM tumor table. In considering new approaches, I discovered this OHDSI Oncology WG.

The current thinking is that the PCORNet tumor table would look a lot like the NAACCR file, whereas I gather this OHDSI WG aims to do a pretty deep integration of NAACCR data into the OMOP CDM; something that wouldn’t easily be reversible, for example. But I see you’re also scraping vocabulary data out of NAACCR specs and I suspect there’s some effort in that area that we could share.

I see one or two active groups in HL7/FHIR as well. I aim to expand on that a bit in GPC issue #739.

Today, directly-opposed results from RCTs about the duration of adjuvant trastuzumab for breast cancer were published in the Lancet.
The PERSEPHONE study concluded that 6-month adjuvant trastuzumab was non-inferior to 12-month treatment in patients with HER2-positive early breast cancer with less cardiotoxicity and fewer severe adverse events with median follow-up duration of 5.4 years.
In contrast, the PHARE study concluded that adjuvant trastuzumab standard duration should remain 12 months in these patients with the median follow-up duration of 7.5 years.
It’s worthy to read the editorial comment for these two articles.

We can compare the true long-term effect (survival, cardiotoxicity, and real cardiac event) of 6 month vs. 12 month trastuzumab therapy in the patients with HER-2 positive early breast cancer in the real world.

Friends.

We should start with simple characterizations.

Here is one:

Patients with bladder cancer ± metastases. Treatments. Disease free/even free/overall survival. Other outcomes.

This is interesting but as the editorial says, the reason for the “directly-opposed results” was a different choice of non-inferiority margin; the actual HR + 95% CI was extremely similar for both studies! You have a problem looking at this particular use case in the real world, which is that you will have a lot of difficulty differentiating a planned 6-month course from a premature discontinuation of a 12-month course (usually due to toxicity or recurrence during the treatment period). So it would be no surprise to find that the 6ish month group would have more toxicity and worse outcomes in the real world. Same issue with 3-month vs. 6-month CapeOx or mFOLFOX6 for adjuvant treatment of stage III colon cancer.

Thank you for the invaluable comment, @Jeremy_Warner. I totally agree with you in this case.

I think we’re discussing which element in oncology data should be standardized and how to.

So, even though it’s not worthy or reliable, let me think about what kind of information elements we need if we conduct a real-world replica study of PERSEPHONE and PHARE (sorry, another good use case doesn’t come up with me now).

Then, we need information including:

  • For baseline characteristics

    • staging for breast cancer
    • concomitant chemotherapy
    • prior surgery or radiotherapy
    • other basic demographics (age, race, BMI, …)
    • Performance status (such as ECOG)
    • positiveness of ER/PR/HER2
  • For longitudinal characteristics

    • treatment pattern after baseline
  • For outcome

    • progression (for progression-free survival)
    • recurrence
    • death
    • cardiotoxicity outcome (such as incidence of heart failure)
    • other adverse events such as uterine cancer

We should have all of them standardized except ECOG. But that is an inclusion criterion usually for making studies feasible. if the patient is too bad in a state they cannot follow the protocol. In Real World study you want those explicitly.

NP

All nicely standardized, but progression and recurrence might require abstraction. We don’t have defined those algorithms, yet.

@dckc Looks like we have a shared interest in developing code to prepare NAACCR to be ingested into common data models. Would you be open to having a meeting with us to discuss ingesting NAACCR into common data models? Maybe we could help each other.

The current thinking for OMOP is to not model any tables based on the NAACCR file format/data dictionary. Instead, we are ingesting the NAACCR file format/data dictionary into the OMOP vocabulary tables by assigning NAACCR items and NAACCR item codes to OMOP domains. OMOP domain assignment will direct where NAACCR data should land within the OMOP CDM. Upon first ingestion, many NAACCR concepts will be marked as ‘standard’. Our goal is to eventually make NAACCR an OMOP non-standard source vocabulary that maps to other standard vocabularies like SNOMED or LOINC. This will make the OMOP vocabulary not U.S.-centric.

Similar to what it looks like you have done, we are ingesting an ‘edited’ subset of the NAACCR “ontology” (:roll_eyes:). Putting our effort on oncology diagnosis, oncology diagnosis modifiers, oncology treatments and oncology outcomes. We are only focusing effort on NAACCR sections: ‘Cancer Identification’, ‘Stage/Prognostic Factors’, ‘Treatment-1st Course’ and ‘Follow-up/Recurrence/Death’

Regarding the SEER API, you might be interested to know that the SEER API only supports UICC Version 7 staging edition. Not version 8. See here:

We have had good luck extracting ‘performance status’ declarations from clinic encounter progress notes with rules-based NLP. Also, we have seen ‘performance status’ declarations routinely captured in flowsheets and smart forms in some clinics. A practice that should be encouraged as much as possible. Discretely captured/extracted ‘performance status’, ‘recurrence’, and ‘progression’ declarations would make @SCYou use case closer to feasible.

1 Like
t