OHDSI Home | Forums | Wiki | Github

Oncology data use cases

The Oncology WG needs your use cases!

Are you are familiar with the NAACCR data set or other widely-used tumor registry data standards? Have you used theses data in research?

The Oncology WG is developing short and long-term goals for mapping these resources so they can be ETLed into OMOP and used to address new and important cancer questions.

Please use this thread to describe the data mapping tasks we should prioritize.

Thank you for starting this thread, @Andrew

Current project of ours 1: Identify specific genetic variants between different histology types (lepidic vs. papillary) among Non Small Cell Lung Cancer (NSCLC).

-What we need for this?

  1. Genetic information: Genetic WG is working
  2. Clinical stage (AJCC)
  3. Pathologic stage (AJCC)
  4. Types of NSCLC: eg. adenocarcinoma, squamous cell carcinoma, large cell carcinoma
  5. Histology: eg. lepidic, acinar, papillary, micropapillary, solid

Current project of ours 2: Describing therapeutic patterns in breast cancer using national claim data
-What we need for this?

  1. Stage of breast cancer: metastatic vs non-metastatic
  2. presence of ER/PR/HER2
  3. Treatment pattern: Treatment table can work for this


I’m interested to see if it’s possible to map clinical data from The Cancer Genome Atlas to OMOP; there is quite a bit of clinical data from TCGA that is now housed by the GDC.

I’m quite familiar with the content and am willing to help, if this may be of interest. If not, no worries :smile:

My very best,

Anita Umesh

@aumesh Of course, we’re so interested in!

We tried to convert genetic information of TCGA into prototype of G-CDM (Genetic extension for OMOP-CDM).
You can see how we did here .

What we wanted to do is that leveraging open dataset such as TCGA to find important variants, and then apply this information into real clinical setting based on CDM.
If you don’t mind, we can work together to convert TCGA into OMOP-CDM and apply what we’ve learned from this to other institutions in OHDSI network.

We have a number of cancer consortia projects that aggregate clinical data derived from trial CRFs. We are keenly interested in finding a way of incorporating such data into OMOP. If there are regular meetings of the Oncology WG, I would be interested in attending / participating.


Brian Furner

Please do, @Brian_Furner. @rimma, can you invite?

I have similar cases as Brian, and would like to participate to learn more and help contribute.


@Brian_Furner , @aumesh ,

Welcome to the group! Please send me your email addresses so that I can include you in the mailing list.

The workgroup page is here: http://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:oncology-sg#

Please include my email (bfurner@bsd.uchicago.edu mailto:bfurner@bsd.uchicago.edu )




Please use anita.umesh@gmail.com for my email address.

Thank you, Rimma!

My very best,

Done! You should have received the first group email.

Please add me to the group. Totally newbie to OMOP but I have done clinical research for years. My past experience is with CDISC

We will soon resume the Oncology Workgroup meetings. With a focus on working on issues surrounding ETLing data into the new proposed structures in the OHDSI Oncology CDM Extension Proposal. See here:

We have been working on vocabulary support for the extension:

Thanks. Looking forward to helping out

I am re-upping this forum thread. Yesterday’s Oncology Workgroup meeting raised the need to collect concrete oncology analytic use cases for the beta testing of the OMOP CDM Oncology extension. See meeting notes here:


Please post use cases here.

What do you mean by use cases?

Articulated, specific definitions of patient cohorts and event cohorts. Example:

Patients with histology of ‘Anaplastic astrocytoma’ WHO Grade 3 that test ‘present’ for IDH1 and IDH2 mutations in any primary CNS anatomical location treated by surgical resection followed by External beam radiotherapy with a total dose of at least 35 Gy that did not have a progresssion or recurrence for 365 days after surgical resection.

Here’s one from the Greater Plains Collaborative Share Thoughts on Breast Cancer Study:

The Share Thoughts on Breast Cancer study began in May 2015 and is recruiting 1,300 women aged 18 and over who were first diagnosed at one of the participating medical centers with ductal carcinoma in situ or stage I-III breast cancer during January 1, 2013 to May 1, 2014.

The cohort definition was:

  • Inclusion criteria for the UNDERLYING de-identified study population:
    • Any sex
    • Diagnosed with primary breast cancer
    • Age 18+ at the time of dx
    • diagnosed during 7/1/2012 - 6/30/2013 (i.e. 18-30 months prior to survey)
      • (if there are insufficient patients diagnosed in this period, we may extend the window)
      • Also, as the timeline for survey implementation slips we will shift the diagnosis window accordingly
  • Exclude from the SURVEY sample if:
    • Sex not equal to female
    • Less than 18 years of age
    • Prior cancer diagnosis
    • Breast cancer was not microscopically confirmed
    • Only tumor morphology was lobular carcinoma in situ
    • Stage IV breast cancer
    • Known to be deceased
    • Non-English speaking (for now)

In this project, each of the 8 sites integrated their tumor registry NAACCR file into i2b2 (NAACCR_ETL in the GPC wiki); we collected about 50 variables ( bc-variable.csv) and used R markdown and python to do QA (bc_qa) and load the data into REDCap. See also BreastCancerDataSharing in the GPC wiki.

Using a similar approach, the GPC CancerRCR project defines 3 cohorts; for example:

Query 2: PCORnet Modular Program request (Newly Identified Cancer Patients with Evidence of Genetic Test)

  • Age Restriction: > 21 years of age as of September 30, 2016
  • Query Period: October 1, 2015 – September 30, 2016
  • Health Event of Interest (Index) Groups: Lung, Colorectal, Breast, Prostate, Pancreatic or Esophageal Cancer DX
  • Inclusion/Exclusion Criteria 1: Exclusion of the above cancers 10 years before the diagnosis of interest
  • Inclusion/Exclusion Criteria 2: Include procedure for common molecular or genetic tests for cancers of interest
  • Stratification: Age group, sex, race, ethnicity

The GPC’s NAACCR_ETL approach is outdated by v18 (GPC issue #739) and the PCORNet CRG on Cancer is considering a PCORNet CDM tumor table. In considering new approaches, I discovered this OHDSI Oncology WG.

The current thinking is that the PCORNet tumor table would look a lot like the NAACCR file, whereas I gather this OHDSI WG aims to do a pretty deep integration of NAACCR data into the OMOP CDM; something that wouldn’t easily be reversible, for example. But I see you’re also scraping vocabulary data out of NAACCR specs and I suspect there’s some effort in that area that we could share.

I see one or two active groups in HL7/FHIR as well. I aim to expand on that a bit in GPC issue #739.