Andrew:
So, wait. What are you missing? And what data are you looking at? The TCGA data?
Best,
C
From: Andrew Williams [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of Andrew Williams
Sent: Wednesday, November 26, 2014 2:04 PM
To: reich@ohdsi.org
Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL
Andrew
November 26
Glad to hear that ICD-O will be added. Unfortunately adding it will only cover a couple of the things I listed. The NAACCR standard contains the rest.
Andrew
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/13 in your browser.
Previous Replies
wstephens
November 26
Mark,
Here was my plan from my time working on the TCGA data set:
· Histological Behavior -> condition occurrence
o ICD-O-3 8046/3: Non-small cell carcinoma
§ SNOMED "Non-small cell carcinoma", concept id = 4028533
· Histological Site -> Specimen anatomic_site_source_value and anatomic_site_concept_id
o C079: Parotid Gland
§ SNOMED "Parotid gland", concept id 4166063
o Specimen is associated with diagnosis via fact_relationship
o In reality histology is derived from path notes.
· Cancer Grade and Staging -> Observation (these were the CDMv4 SNOMED concepts)
o AJCC: concept 4120174
o Gleason: concept 4157602
o Weiss scale: MISSING
o Karnofsky: concept 4169154
o Allred: MISSING
o ECOG: concept 4167763
o Breslow: concept 4299314
· Laterality
o This is a problem
o I had proposed a anatomic site subdivision concept id for this, but it didn't make it into the final specimen model
Bill
Christian_Reich
November 26
Andrew:
Shame on you for your ignorance ! J
See below.
From: Andrew Williams [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Andrew Williams
Sent: Wednesday, November 26, 2014 11:16 AM
To:reich@ohdsi.org
Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL
Andrew
November 26
Forgive me if this betrays my ignorance of the CDM, but it’s not clear to me which standard concepts in the condition domain would capture some variables that are often needed for research on cancer care and outcomes that relies on cancer registry and EHR data.
Very good. We like *use cases*, rather than the sometimes exhibited attitude “I want all data, maybe I might need them some time”.
Am I right that IDC-O-3 is not a CDM 5 vocabulary?
Not yet. We are planning on it.
In addition to ICD-O site and morphology codes, codes that distinguish “primary site current” from “primary site original” and “morphology current” from “morphology original” are often needed in the research I’ve been a part of. Staging often needs to be represented in various ways: SEER, AJCC, Collaborative staging, separate T, N and M (pathology and clinical) and various other values related to AJCC staging. Other characteristics – tumor size, tumor sequence, morphology, behavior, histologic grading, laterality (for paired organs), extent of disease are also needed.
Would ICD-O cover all you need?
I don’t know the cost in computational efficiency of adding/extending existing tables to capture these. But NCI is one of the largest sponsors of health research including observational data-based research. The richness of the routinely collected data characterizing cancers and related treatments reflects the intensity and sophistication of oncologic care. Excluding key oncology concepts might exact a high cost on the value and fundability of OMOP CDM-based cancer research.
Understood. Again, as pointed out in a previous email: We need not only to import those concepts, but also link them up to the condition hierarchy, so folks can find the stuff by navigating.
Again, I apologize if it is obvious that these values can be captured without extending the CDM. If they can’t, a strategy of making the minimum modifications needed to fully capture NAACCR variable definitions would balance the computational efficiency and research opportunity costs.
No problems. This is why we are doing this.
C
Andrew
Andrew Williams • Faculty Scientist II • Maine Medical Center Research Institute Center for Outcomes Research & Evaluation • 509 Forest Ave, Suite 200 • Portland, ME 04101 email: aewilliams@mmc.orgaewilliams@mmc.org • ph: 207.661.7607 • fax: 207.662.3110
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.
Previous Replies
Christian_Reich
November 26
Bill:
I’ll put it in a new forum page.
What is it that you need to do with the NAACCR? I am not familiar with it.
ICD-O-3 is on the list of things. Importing it is easy, the work is to plug it into the hierarchical system of conditions.
C
From: William Stephens [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of William Stephens
Sent: Wednesday, November 26, 2014 10:44 AM
To:reich@ohdsi.org
Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL
wstephens
November 26
I agree that the data should go where the domain directs it. The ability to identify provenance of SEER vs Medicare is missing. I'm customizing on top of CDM 5 to enable this level of provenance tracking for our projects that leverage data from multiple source systems (EMR, CRMS, LIMS...)
· cdm_source: add ability to support provenance for more than one source system
o Add auto increment cdm_source_id column
o Add sources to identify source of generated data in ERA tables
· To each non-vocabulary table,
o Add: cdm_source_id FK to cdm_source.cdm_source_id
Concerning NAACCR, I'm already creating a parser for fixed width files that is metadata driven into CDM v5 because I need it for 2 cancer clients. It's something that I can leverage for SEER data as well, so that's a big win in my opinion. We'll need to make sure that ICDO-3 is supported fully in Vocabularies, which will benefit any cancer application go the tools.
Bill
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/4 in your browser.
To unsubscribe from these emails, visit your user preferences.
![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)
Christian_Reich
November 26
Patrick:
What folks have expressed is the desire to add provenance in terms of source_id to each datum. We don’t have that. The types only give us the general source, but not specifically. In this case, where you have registry and claims that would be unambiguous, but if you have several claims or several EHR or several registry sources, the types won’t work.
CDM Version 5.1. I already started a forum page. Haven’t released anything, not ready yet.
C
From: Patrick Ryan [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Patrick Ryan
Sent: Wednesday, November 26, 2014 10:50 AM
To:reich@ohdsi.org
Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL
Patrick_Ryan
November 26
Unless I'm not understanding, provenence isn't missing, that's what the TYPECONCEPT_ID fields are for. It may be that there is need to add additional type concept to the vocabulary, but structurally it should all be there.
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/5 in your browser.
To unsubscribe from these emails, visit your user preferences.
![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.
To unsubscribe from these emails, visit your user preferences.
![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)
To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/13 in your browser.
To unsubscribe from these emails, visit your user preferences.
![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)