OHDSI Home | Forums | Wiki | Github

Oncology data use cases

(Seng Chan You) #21

Today, directly-opposed results from RCTs about the duration of adjuvant trastuzumab for breast cancer were published in the Lancet.
The PERSEPHONE study concluded that 6-month adjuvant trastuzumab was non-inferior to 12-month treatment in patients with HER2-positive early breast cancer with less cardiotoxicity and fewer severe adverse events with median follow-up duration of 5.4 years.
In contrast, the PHARE study concluded that adjuvant trastuzumab standard duration should remain 12 months in these patients with the median follow-up duration of 7.5 years.
It’s worthy to read the editorial comment for these two articles.

We can compare the true long-term effect (survival, cardiotoxicity, and real cardiac event) of 6 month vs. 12 month trastuzumab therapy in the patients with HER-2 positive early breast cancer in the real world.

(Christian Reich) #22


We should start with simple characterizations.

Here is one:

Patients with bladder cancer ± metastases. Treatments. Disease free/even free/overall survival. Other outcomes.

(Jeremy Warner) #23

This is interesting but as the editorial says, the reason for the “directly-opposed results” was a different choice of non-inferiority margin; the actual HR + 95% CI was extremely similar for both studies! You have a problem looking at this particular use case in the real world, which is that you will have a lot of difficulty differentiating a planned 6-month course from a premature discontinuation of a 12-month course (usually due to toxicity or recurrence during the treatment period). So it would be no surprise to find that the 6ish month group would have more toxicity and worse outcomes in the real world. Same issue with 3-month vs. 6-month CapeOx or mFOLFOX6 for adjuvant treatment of stage III colon cancer.

(Seng Chan You) #24

Thank you for the invaluable comment, @Jeremy_Warner. I totally agree with you in this case.

I think we’re discussing which element in oncology data should be standardized and how to.

So, even though it’s not worthy or reliable, let me think about what kind of information elements we need if we conduct a real-world replica study of PERSEPHONE and PHARE (sorry, another good use case doesn’t come up with me now).

Then, we need information including:

  • For baseline characteristics

    • staging for breast cancer
    • concomitant chemotherapy
    • prior surgery or radiotherapy
    • other basic demographics (age, race, BMI, …)
    • Performance status (such as ECOG)
    • positiveness of ER/PR/HER2
  • For longitudinal characteristics

    • treatment pattern after baseline
  • For outcome

    • progression (for progression-free survival)
    • recurrence
    • death
    • cardiotoxicity outcome (such as incidence of heart failure)
    • other adverse events such as uterine cancer

(Christian Reich) #25

We should have all of them standardized except ECOG. But that is an inclusion criterion usually for making studies feasible. if the patient is too bad in a state they cannot follow the protocol. In Real World study you want those explicitly.


All nicely standardized, but progression and recurrence might require abstraction. We don’t have defined those algorithms, yet.

(Michael Gurley) #26

@dckc Looks like we have a shared interest in developing code to prepare NAACCR to be ingested into common data models. Would you be open to having a meeting with us to discuss ingesting NAACCR into common data models? Maybe we could help each other.

The current thinking for OMOP is to not model any tables based on the NAACCR file format/data dictionary. Instead, we are ingesting the NAACCR file format/data dictionary into the OMOP vocabulary tables by assigning NAACCR items and NAACCR item codes to OMOP domains. OMOP domain assignment will direct where NAACCR data should land within the OMOP CDM. Upon first ingestion, many NAACCR concepts will be marked as ‘standard’. Our goal is to eventually make NAACCR an OMOP non-standard source vocabulary that maps to other standard vocabularies like SNOMED or LOINC. This will make the OMOP vocabulary not U.S.-centric.

Similar to what it looks like you have done, we are ingesting an ‘edited’ subset of the NAACCR “ontology” (:roll_eyes:). Putting our effort on oncology diagnosis, oncology diagnosis modifiers, oncology treatments and oncology outcomes. We are only focusing effort on NAACCR sections: ‘Cancer Identification’, ‘Stage/Prognostic Factors’, ‘Treatment-1st Course’ and ‘Follow-up/Recurrence/Death’

Regarding the SEER API, you might be interested to know that the SEER API only supports UICC Version 7 staging edition. Not version 8. See here:

(Michael Gurley) #27

We have had good luck extracting ‘performance status’ declarations from clinic encounter progress notes with rules-based NLP. Also, we have seen ‘performance status’ declarations routinely captured in flowsheets and smart forms in some clinics. A practice that should be encouraged as much as possible. Discretely captured/extracted ‘performance status’, ‘recurrence’, and ‘progression’ declarations would make @SCYou use case closer to feasible.

(Dan Connolly) #28

@mgurley as to a meeting: I see this Oncology subgroup is scheduled to meet tomorrow at 11am ET. Is this (ingesting NAACCR into common data models) likely to be on the agenda?

GPC issue #739 is on the agenda of our weekly gpc-dev meeting the following hour (Tue 11am Central). You’re more than welcome to attend. ​Meeting ID and access code: 817-393-381; call +1 (571) 317-3131 . I’m sending the rest of the agenda to ​gpc-dev today. I might as copy you when I send it.

(Michael Gurley) #29

Tomorrow we are not specifically discussing NAACCR ingestion. Here is the agenda for tomorrow:

Hokyun Jeon from the Department of Biomedical Science, Ajou University Graduate School of Medicine will present on the following topic:

‘Conversion of Diagnosis and Chemotherapy Data in Electronic Health Records to Episode-based Oncology Extension of OMOP-CDM’

But I will attend the GPC meeting tomorrow.