Philosophical question on oncology ETL

Bill Stephens, Ryan Duryea, and I talked to the National Cancer Institute last week, and they are interested in putting SEER (cancer registry data) in the common data model.

Much of it looks doable. We can map many elements (stage, grade, histology, behavior, etc.) to an appropriate LOINC code, and we can map the values to a SNOMED concept. In fact, there is some guidance on the reverse process – how to take EHR data and map it to cancer registry format (NAACR has a standard format for all oncology registry data). cancer registry reporting guidelines

So, the questions are as follows. First, should we use the NAACR format (or a subset) as an oncology standard in the CDM? My guess is no.

If we want to do the mapping to SNOMED and LOINC, and put these in the observation table (or one of the new V5 tables), then there is potentially a slippery slope to consider.

For example, should we also be doing this for all ICD9 (or ICD10) conditions that map to cancer? That seems like it is not necessary – the information is already available in the condition occurrence table as part of the source to concept mapping in the ETL process. And claims data has little in the way of detailed oncology information.

But what about data like SEER Medicare data where we get cancer registry data AND Medicare claims data? Do we keep them separate – put the SEER cancer registry data in the observation (and new related tables in V5) and leave the cancer-related claims data in the condition occurrence table? If we do this, then any queries will have to be database specific – you would have to know that you are querying SEER Medicare data to know to look for cancer codes in the observation table(s).

Any thoughts or ideas (or questions) would be appreciated.

Mark et alii:

See in text:

From: Mark Danese [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of Mark Danese

Sent: Wednesday, November 26, 2014 12:07 AM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Mark_Danese

November 26

Bill Stephens, Ryan Duryea, and I talked to the National Cancer Institute last week, and they are interested in putting SEER (cancer registry data) in the common data model.

Much of it looks doable. We can map many elements (stage, grade, histology, behavior, etc.) to an appropriate LOINC code, and we can map the values to a SNOMED concept. In fact, there is some guidance on the reverse process -- how to take EHR data and map it to cancer registry format (NAACR has a standard format for all oncology registry data). cancer registry reporting guidelines

As you are writing below, there is a problem. Stage, grade, histology, behavior according to V5 (and now also V4 backwards) are conditions, not observations. Anything from a sign to a symptom to a syndrome to a fully-fledged diagnosis are now conditions. Loinc may not be the right concept to do that. We are planning on incorporating ICD-O-3. When are you planning on doing this, so we could potentially move this up?

So, the questions are as follows. First, should we use the NAACR format (or a subset) as an oncology standard in the CDM? My guess is no.

Good guess. J

If we want to do the mapping to SNOMED and LOINC, and put these in the observation table (or one of the new V5 tables), then there is potentially a slippery slope to consider.

For example, should we also be doing this for all ICD9 (or ICD10) conditions that map to cancer? That seems like it is not necessary -- the information is already available in the condition occurrence table as part of the source to concept mapping in the ETL process. And claims data has little in the way of detailed oncology information.

Exactly. But that alone wouldn’t bother me. You can have a concept mapped from ICD-9 on diagnosis of cancer X, and you can have at the same time a more precise grading in ICD-O.

But what about data like SEER Medicare data where we get cancer registry data AND Medicare claims data?

That’s what we want. Rich data.

Do we keep them separate -- put the SEER cancer registry data in the observation (and new related tables in V5) and leave the cancer-related claims data in the condition occurrence table?

No. Put the data where the domain tells you to put it. The fact that they have different origin – use (or request if missing) the type concepts for that.

If we do this, then any queries will have to be database specific -- you would have to know that you are querying SEER Medicare data to know to look for cancer codes in the observation table(s).

Yeah, we don’t want that. Kills the idea of a CDM.

Any thoughts or ideas (or questions) would be appreciated.

Let me know what kind of mapping help you need.

Best,

C


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/1 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

My two cents: I’m not aware of any data in SEER-Medicare that can’t be fit into the current OMOP CDM, so would not recommend extending the model to NAACR or making data tables that are oncology-specific.

Philosophically, I try not to think of where the data comes from (e.g. registry or claims) but instead what that data represents when deciding where it should be housed within the CDM. If a record, from either the SEER survey or the claims, is an indicator of a diagnosis of a disease, that record should be placed in the CONDITION_OCCURRENCE table. In the CONDITION_OCCURRENCE table, you can maintain the provenance of the data using the CONDITION_TYPE_CONCEPT_ID (so you can distinguish if the record is from a registry or claims, if you need to), and you can use any standard concept in the CONDITION domain to represent the code in your source. If the record of interest is a measurement of tumor size, then it’d be a good candidate to be stored in the MEASUREMENT table. If the record indicates the administration of a treatment for the cancer, it’d make sense to go DRUG_EXPOSURE table. If the record is an unstructured patient-reported questionnaire, it may be good to go into OBSERVATION. All of these decisions are source-specific and driven by the ETL, but one helpful convention is that if you map to the source code to a standard concept, you can then use the domain of that concept to guide the appropriate table to store the data. When developing standardized analytics, we rely on the domain of the concept to determine where to look for it in the model.

Cheers,

Patrick

1 Like

I agree that the data should go where the domain directs it. The ability to identify provenance of SEER vs Medicare is missing. I’m customizing on top of CDM 5 to enable this level of provenance tracking for our projects that leverage data from multiple source systems (EMR, CRMS, LIMS…)

  • cdm_source: add ability to support provenance for more than one source system
  • Add auto increment cdm_source_id column
  • Add sources to identify source of generated data in ERA tables
  • To each non-vocabulary table,
  • Add: cdm_source_id FK to cdm_source.cdm_source_id

Concerning NAACCR, I’m already creating a parser for fixed width files that is metadata driven into CDM v5 because I need it for 2 cancer clients. It’s something that I can leverage for SEER data as well, so that’s a big win in my opinion. We’ll need to make sure that ICDO-3 is supported fully in Vocabularies, which will benefit any cancer application go the tools.

Bill

1 Like

Unless I’m not understanding, provenence isn’t missing, that’s what the _TYPE_CONCEPT_ID fields are for. It may be that there is need to add additional type concept to the vocabulary, but structurally it should all be there.

1 Like

Patrick:

What folks have expressed is the desire to add provenance in terms of source_id to each datum. We don’t have that. The types only give us the general source, but not specifically. In this case, where you have registry and claims that would be unambiguous, but if you have several claims or several EHR or several registry sources, the types won’t work.

CDM Version 5.1. I already started a forum page. Haven’t released anything, not ready yet.

C

From: Patrick Ryan [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Patrick Ryan

Sent: Wednesday, November 26, 2014 10:50 AM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Patrick_Ryan

November 26

Unless I'm not understanding, provenence isn't missing, that's what the TYPECONCEPT_ID fields are for. It may be that there is need to add additional type concept to the vocabulary, but structurally it should all be there.


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/5 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Bill:

I’ll put it in a new forum page.

What is it that you need to do with the NAACCR? I am not familiar with it.

ICD-O-3 is on the list of things. Importing it is easy, the work is to plug it into the hierarchical system of conditions.

C

From: William Stephens [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of William Stephens

Sent: Wednesday, November 26, 2014 10:44 AM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

wstephens

November 26

I agree that the data should go where the domain directs it. The ability to identify provenance of SEER vs Medicare is missing. I'm customizing on top of CDM 5 to enable this level of provenance tracking for our projects that leverage data from multiple source systems (EMR, CRMS, LIMS...)

· cdm_source: add ability to support provenance for more than one source system

o Add auto increment cdm_source_id column

o Add sources to identify source of generated data in ERA tables

· To each non-vocabulary table,

o Add: cdm_source_id FK to cdm_source.cdm_source_id

Concerning NAACCR, I'm already creating a parser for fixed width files that is metadata driven into CDM v5 because I need it for 2 cancer clients. It's something that I can leverage for SEER data as well, so that's a big win in my opinion. We'll need to make sure that ICDO-3 is supported fully in Vocabularies, which will benefit any cancer application go the tools.

Bill


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/4 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Forgive me if this betrays my ignorance of the CDM, but it’s not clear to me which standard concepts in the condition domain would capture some variables that are often needed for research on cancer care and outcomes that relies on cancer registry and EHR data.

Am I right that IDC-O-3 is not a CDM 5 vocabulary? In addition to ICD-O site and morphology codes, codes that distinguish “primary site current” from “primary site original” and “morphology current” from “morphology original” are often needed in the research I’ve been a part of. Staging often needs to be represented in various ways: SEER, AJCC, Collaborative staging, separate T, N and M (pathology and clinical) and various other values related to AJCC staging. Other characteristics – tumor size, tumor sequence, morphology, behavior, histologic grading, laterality (for paired organs), extent of disease are also needed.

I don’t know the cost in computational efficiency of adding/extending existing tables to capture these. But NCI is one of the largest sponsors of health research including observational data-based research. The richness of the routinely collected data characterizing cancers and related treatments reflects the intensity and sophistication of oncologic care. Excluding key oncology concepts might exact a high cost on the value and fundability of OMOP CDM-based cancer research.

Again, I apologize if it is obvious that these values can be captured without extending the CDM. If they can’t, a strategy of making the minimum modifications needed to fully capture NAACCR variable definitions would balance the computational efficiency and research opportunity costs.

Andrew

Andrew Williams • Faculty Scientist II • Maine Medical Center Research Institute
Center for Outcomes Research & Evaluation • 509 Forest Ave, Suite 200 • Portland, ME 04101
email: aewilliams@mmc.orgmailto:aewilliams@mmc.org • ph: 207.661.7607 • fax: 207.662.3110

Christian,

NAACCR is a file format for interchange of cancer registry data. NCI Cancer Centers typically use this format to send data to NCI. Other cooperative groups use this data as a method to interchange de-identified data as well. Most use tools that make is possible to extract this data rather easily from their local registries.

The files are fixed-width coded data files. Data elements and codings are described in the Facility Oncology Registry Data Standards (FORDS) [https://www.facs.org/~/media/files/quality%20programs/cancer/coc/fords/fords%20manual%202013.ashx].

Bill

To answer Andrew’s question, those are exactly the things I am worried about how to get into the CDM. There is an ICD-O-3 to SNOMED mapping for both histology and behavior. But other things like laterality, grade, stage, etc. What tables do those go in? Maybe what I should do is put together a list of some of the key variables, and get the thoughts of some more experienced people on where to place them in CDM v5, and how to do the mapping. As Andrew says, getting this right will be important because a lot of people work with SEER data. Plus, we can all point to NCI as an organization that is supportive of the CDM.

Andrew:

Shame on you for your ignorance ! J

See below.

From: Andrew Williams [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Andrew Williams

Sent: Wednesday, November 26, 2014 11:16 AM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Andrew

November 26

Forgive me if this betrays my ignorance of the CDM, but it’s not clear to me which standard concepts in the condition domain would capture some variables that are often needed for research on cancer care and outcomes that relies on cancer registry and EHR data.

Very good. We like *use cases*, rather than the sometimes exhibited attitude “I want all data, maybe I might need them some time”.

Am I right that IDC-O-3 is not a CDM 5 vocabulary?

Not yet. We are planning on it.

In addition to ICD-O site and morphology codes, codes that distinguish “primary site current” from “primary site original” and “morphology current” from “morphology original” are often needed in the research I’ve been a part of. Staging often needs to be represented in various ways: SEER, AJCC, Collaborative staging, separate T, N and M (pathology and clinical) and various other values related to AJCC staging. Other characteristics – tumor size, tumor sequence, morphology, behavior, histologic grading, laterality (for paired organs), extent of disease are also needed.

Would ICD-O cover all you need?

I don’t know the cost in computational efficiency of adding/extending existing tables to capture these. But NCI is one of the largest sponsors of health research including observational data-based research. The richness of the routinely collected data characterizing cancers and related treatments reflects the intensity and sophistication of oncologic care. Excluding key oncology concepts might exact a high cost on the value and fundability of OMOP CDM-based cancer research.

Understood. Again, as pointed out in a previous email: We need not only to import those concepts, but also link them up to the condition hierarchy, so folks can find the stuff by navigating.

Again, I apologize if it is obvious that these values can be captured without extending the CDM. If they can’t, a strategy of making the minimum modifications needed to fully capture NAACCR variable definitions would balance the computational efficiency and research opportunity costs.

No problems. This is why we are doing this.

C

Andrew

Andrew Williams • Faculty Scientist II • Maine Medical Center Research Institute Center for Outcomes Research & Evaluation • 509 Forest Ave, Suite 200 • Portland, ME 04101 email: aewilliams@mmc.orgaewilliams@mmc.org • ph: 207.661.7607 • fax: 207.662.3110

To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.


Previous Replies

Christian_Reich

November 26

Bill:

I’ll put it in a new forum page.

What is it that you need to do with the NAACCR? I am not familiar with it.

ICD-O-3 is on the list of things. Importing it is easy, the work is to plug it into the hierarchical system of conditions.

C

From: William Stephens [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of William Stephens

Sent: Wednesday, November 26, 2014 10:44 AM

To:reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

wstephens

November 26

I agree that the data should go where the domain directs it. The ability to identify provenance of SEER vs Medicare is missing. I'm customizing on top of CDM 5 to enable this level of provenance tracking for our projects that leverage data from multiple source systems (EMR, CRMS, LIMS...)

· cdm_source: add ability to support provenance for more than one source system

o Add auto increment cdm_source_id column

o Add sources to identify source of generated data in ERA tables

· To each non-vocabulary table,

o Add: cdm_source_id FK to cdm_source.cdm_source_id

Concerning NAACCR, I'm already creating a parser for fixed width files that is metadata driven into CDM v5 because I need it for 2 cancer clients. It's something that I can leverage for SEER data as well, so that's a big win in my opinion. We'll need to make sure that ICDO-3 is supported fully in Vocabularies, which will benefit any cancer application go the tools.

Bill


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/4 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Christian_Reich

November 26

Patrick:

What folks have expressed is the desire to add provenance in terms of source_id to each datum. We don’t have that. The types only give us the general source, but not specifically. In this case, where you have registry and claims that would be unambiguous, but if you have several claims or several EHR or several registry sources, the types won’t work.

CDM Version 5.1. I already started a forum page. Haven’t released anything, not ready yet.

C

From: Patrick Ryan [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Patrick Ryan

Sent: Wednesday, November 26, 2014 10:50 AM

To:reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Patrick_Ryan

November 26

Unless I'm not understanding, provenence isn't missing, that's what the TYPECONCEPT_ID fields are for. It may be that there is need to add additional type concept to the vocabulary, but structurally it should all be there.


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/5 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Mark,

Here was my plan from my time working on the TCGA data set:

  • Histological Behavior -> condition occurrence
  • ICD-O-3 8046/3: Non-small cell carcinoma
  • SNOMED “Non-small cell carcinoma”, concept id = 4028533
  • Histological Site -> Specimen anatomic_site_source_value and anatomic_site_concept_id
  • C079: Parotid Gland
  • SNOMED “Parotid gland”, concept id 4166063
  • Specimen is associated with diagnosis via fact_relationship
  • In reality histology is derived from path notes.
  • Cancer Grade and Staging -> Observation (these were the CDMv4 SNOMED concepts)
  • AJCC: concept 4120174
  • Gleason: concept 4157602
  • Weiss scale: MISSING
  • Karnofsky: concept 4169154
  • Allred: MISSING
  • ECOG: concept 4167763
  • Breslow: concept 4299314
  • Laterality
  • This is a problem
  • I had proposed a anatomic site subdivision concept id for this, but it didn’t make it into the final specimen model

Bill

1 Like

Glad to hear that ICD-O will be added. Unfortunately adding it will only cover a couple of the things I listed. The NAACCR standard contains the rest.

Andrew

Andrew:

So, wait. What are you missing? And what data are you looking at? The TCGA data?

Best,

C

From: Andrew Williams [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of Andrew Williams

Sent: Wednesday, November 26, 2014 2:04 PM

To: reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Andrew

November 26

Glad to hear that ICD-O will be added. Unfortunately adding it will only cover a couple of the things I listed. The NAACCR standard contains the rest.

Andrew

To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/13 in your browser.


Previous Replies

wstephens

November 26

Mark,

Here was my plan from my time working on the TCGA data set:

· Histological Behavior -> condition occurrence

o ICD-O-3 8046/3: Non-small cell carcinoma

§ SNOMED "Non-small cell carcinoma", concept id = 4028533

· Histological Site -> Specimen anatomic_site_source_value and anatomic_site_concept_id

o C079: Parotid Gland

§ SNOMED "Parotid gland", concept id 4166063

o Specimen is associated with diagnosis via fact_relationship

o In reality histology is derived from path notes.

· Cancer Grade and Staging -> Observation (these were the CDMv4 SNOMED concepts)

o AJCC: concept 4120174

o Gleason: concept 4157602

o Weiss scale: MISSING

o Karnofsky: concept 4169154

o Allred: MISSING

o ECOG: concept 4167763

o Breslow: concept 4299314

· Laterality

o This is a problem

o I had proposed a anatomic site subdivision concept id for this, but it didn't make it into the final specimen model

Bill

Christian_Reich

November 26

Andrew:

Shame on you for your ignorance ! J

See below.

From: Andrew Williams [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Andrew Williams

Sent: Wednesday, November 26, 2014 11:16 AM

To:reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Andrew

November 26

Forgive me if this betrays my ignorance of the CDM, but it’s not clear to me which standard concepts in the condition domain would capture some variables that are often needed for research on cancer care and outcomes that relies on cancer registry and EHR data.

Very good. We like *use cases*, rather than the sometimes exhibited attitude “I want all data, maybe I might need them some time”.

Am I right that IDC-O-3 is not a CDM 5 vocabulary?

Not yet. We are planning on it.

In addition to ICD-O site and morphology codes, codes that distinguish “primary site current” from “primary site original” and “morphology current” from “morphology original” are often needed in the research I’ve been a part of. Staging often needs to be represented in various ways: SEER, AJCC, Collaborative staging, separate T, N and M (pathology and clinical) and various other values related to AJCC staging. Other characteristics – tumor size, tumor sequence, morphology, behavior, histologic grading, laterality (for paired organs), extent of disease are also needed.

Would ICD-O cover all you need?

I don’t know the cost in computational efficiency of adding/extending existing tables to capture these. But NCI is one of the largest sponsors of health research including observational data-based research. The richness of the routinely collected data characterizing cancers and related treatments reflects the intensity and sophistication of oncologic care. Excluding key oncology concepts might exact a high cost on the value and fundability of OMOP CDM-based cancer research.

Understood. Again, as pointed out in a previous email: We need not only to import those concepts, but also link them up to the condition hierarchy, so folks can find the stuff by navigating.

Again, I apologize if it is obvious that these values can be captured without extending the CDM. If they can’t, a strategy of making the minimum modifications needed to fully capture NAACCR variable definitions would balance the computational efficiency and research opportunity costs.

No problems. This is why we are doing this.

C

Andrew

Andrew Williams • Faculty Scientist II • Maine Medical Center Research Institute Center for Outcomes Research & Evaluation • 509 Forest Ave, Suite 200 • Portland, ME 04101 email: aewilliams@mmc.orgaewilliams@mmc.org • ph: 207.661.7607 • fax: 207.662.3110

To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.


Previous Replies

Christian_Reich

November 26

Bill:

I’ll put it in a new forum page.

What is it that you need to do with the NAACCR? I am not familiar with it.

ICD-O-3 is on the list of things. Importing it is easy, the work is to plug it into the hierarchical system of conditions.

C

From: William Stephens [mailto:notifications@mail128-6.atl41.mandrillapp.com] On Behalf Of William Stephens

Sent: Wednesday, November 26, 2014 10:44 AM

To:reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

wstephens

November 26

I agree that the data should go where the domain directs it. The ability to identify provenance of SEER vs Medicare is missing. I'm customizing on top of CDM 5 to enable this level of provenance tracking for our projects that leverage data from multiple source systems (EMR, CRMS, LIMS...)

· cdm_source: add ability to support provenance for more than one source system

o Add auto increment cdm_source_id column

o Add sources to identify source of generated data in ERA tables

· To each non-vocabulary table,

o Add: cdm_source_id FK to cdm_source.cdm_source_id

Concerning NAACCR, I'm already creating a parser for fixed width files that is metadata driven into CDM v5 because I need it for 2 cancer clients. It's something that I can leverage for SEER data as well, so that's a big win in my opinion. We'll need to make sure that ICDO-3 is supported fully in Vocabularies, which will benefit any cancer application go the tools.

Bill


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/4 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Christian_Reich

November 26

Patrick:

What folks have expressed is the desire to add provenance in terms of source_id to each datum. We don’t have that. The types only give us the general source, but not specifically. In this case, where you have registry and claims that would be unambiguous, but if you have several claims or several EHR or several registry sources, the types won’t work.

CDM Version 5.1. I already started a forum page. Haven’t released anything, not ready yet.

C

From: Patrick Ryan [mailto:notifications@mail128-137.atl41.mandrillapp.com] On Behalf Of Patrick Ryan

Sent: Wednesday, November 26, 2014 10:50 AM

To:reich@ohdsi.org

Subject: [OHDSI Forums] [Implementers] Philosophical question on oncology ETL

Patrick_Ryan

November 26

Unless I'm not understanding, provenence isn't missing, that's what the TYPECONCEPT_ID fields are for. It may be that there is need to add additional type concept to the vocabulary, but structurally it should all be there.


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/5 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/8 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)


To respond, reply to this email or visit http://forums.ohdsi.org/t/philosophical-question-on-oncology-etl/91/13 in your browser.

To unsubscribe from these emails, visit your user preferences.

![|1x1](upload://xcjzfbLDGQ9Ycfw3fLV0RdA4Bve.gif)

Christian,

I’ve been working on the TCGA data set for a while. There are many data elements that do not appear to map to any available concepts on Vocab v4. I need to revisit this mapping with V5 vocabularies.

Where do I obtain the v5 vocabulary data load files?

Thanks,
Bill