Are you are familiar with the NAACCR data set or other widely-used tumor registry data standards? Have you used theses data in research?
The Oncology WG is developing short and long-term goals for mapping these resources so they can be ETLed into OMOP and used to address new and important cancer questions.
Please use this thread to describe the data mapping tasks we should prioritize.
Thanks!
Current project of ours 1: Identify specific genetic variants between different histology types (lepidic vs. papillary) among Non Small Cell Lung Cancer (NSCLC).
-What we need for this?
Genetic information: Genetic WG is working
Clinical stage (AJCC)
Pathologic stage (AJCC)
Types of NSCLC: eg. adenocarcinoma, squamous cell carcinoma, large cell carcinoma
Iām interested to see if itās possible to map clinical data from The Cancer Genome Atlas to OMOP; there is quite a bit of clinical data from TCGA that is now housed by the GDC.
Iām quite familiar with the content and am willing to help, if this may be of interest. If not, no worries
We tried to convert genetic information of TCGA into prototype of G-CDM (Genetic extension for OMOP-CDM).
You can see how we did here .
What we wanted to do is that leveraging open dataset such as TCGA to find important variants, and then apply this information into real clinical setting based on CDM.
If you donāt mind, we can work together to convert TCGA into OMOP-CDM and apply what weāve learned from this to other institutions in OHDSI network.
We have a number of cancer consortia projects that aggregate clinical data derived from trial CRFs. We are keenly interested in finding a way of incorporating such data into OMOP. If there are regular meetings of the Oncology WG, I would be interested in attending / participating.
We will soon resume the Oncology Workgroup meetings. With a focus on working on issues surrounding ETLing data into the new proposed structures in the OHDSI Oncology CDM Extension Proposal. See here:
I am re-upping this forum thread. Yesterdayās Oncology Workgroup meeting raised the need to collect concrete oncology analytic use cases for the beta testing of the OMOP CDM Oncology extension. See meeting notes here:
Articulated, specific definitions of patient cohorts and event cohorts. Example:
Patients with histology of āAnaplastic astrocytomaā WHO Grade 3 that test āpresentā for IDH1 and IDH2 mutations in any primary CNS anatomical location treated by surgical resection followed by External beam radiotherapy with a total dose of at least 35 Gy that did not have a progresssion or recurrence for 365 days after surgical resection.
The Share Thoughts on Breast Cancer study began in May 2015 and is recruiting 1,300 women aged 18 and over who were first diagnosed at one of the participating medical centers with ductal carcinoma in situ or stage I-III breast cancer during January 1, 2013 to May 1, 2014.
The cohort definition was:
Inclusion criteria for the UNDERLYING de-identified study population:
Any sex
Diagnosed with primary breast cancer
Age 18+ at the time of dx
diagnosed during 7/1/2012 - 6/30/2013 (i.e. 18-30 months prior to survey)
(if there are insufficient patients diagnosed in this period, we may extend the window)
Also, as the timeline for survey implementation slips we will shift the diagnosis window accordingly
Exclude from the SURVEY sample if:
Sex not equal to female
Less than 18 years of age
Prior cancer diagnosis
Breast cancer was not microscopically confirmed
Only tumor morphology was lobular carcinoma in situ
Stage IV breast cancer
Known to be deceased
Non-English speaking (for now)
In this project, each of the 8 sites integrated their tumor registry NAACCR file into i2b2 (NAACCR_ETL in the GPC wiki); we collected about 50 variables ( bc-variable.csv) and used R markdown and python to do QA (bc_qa) and load the data into REDCap. See also BreastCancerDataSharing in the GPC wiki.
Using a similar approach, the GPC CancerRCR project defines 3 cohorts; for example:
Query 2: PCORnet Modular Program request (Newly Identified Cancer Patients with Evidence of Genetic Test)
Age Restriction: > 21 years of age as of September 30, 2016
Query Period: October 1, 2015 ā September 30, 2016
Health Event of Interest (Index) Groups: Lung, Colorectal, Breast, Prostate, Pancreatic or Esophageal Cancer DX
Inclusion/Exclusion Criteria 1: Exclusion of the above cancers 10 years before the diagnosis of interest
Inclusion/Exclusion Criteria 2: Include procedure for common molecular or genetic tests for cancers of interest
Stratification: Age group, sex, race, ethnicity
The GPCās NAACCR_ETL approach is outdated by v18 (GPC issue #739) and the PCORNet CRG on Cancer is considering a PCORNet CDM tumor table. In considering new approaches, I discovered this OHDSI Oncology WG.
The current thinking is that the PCORNet tumor table would look a lot like the NAACCR file, whereas I gather this OHDSI WG aims to do a pretty deep integration of NAACCR data into the OMOP CDM; something that wouldnāt easily be reversible, for example. But I see youāre also scraping vocabulary data out of NAACCR specs and I suspect thereās some effort in that area that we could share.
I see one or two active groups in HL7/FHIR as well. I aim to expand on that a bit in GPC issue #739.