OHDSI Home | Forums | Wiki | Github

Oncology: ICDO3 codes missing mappings to schema names

About ~35% of the ICDO3 codes do not have a relationship mapped to a schema name. I wonder if we can augment the vocabulary so that the user can have the options to query on the schema name (i.e. breast) OR the ICDO3 code (i.e. concept_code like ‘%C50%’), if they are building a cohort of breast cancer patients? Perhaps at minimum the mappings for the major cancer types can be filled out.

8050/2-C50 -> Breast
8343/3-C50.3 -> Breast

SELECT
c.concept_id ,
c.concept_code,
c.vocabulary_id,
cr.relationship_id,
c2.concept_id concept_id2,
c2.concept_code AS concept_code2,
c2.vocabulary_id AS vocabulary_id2
FROM
concept c
LEFT JOIN
concept_relationship cr
ON
cr.concept_id_1 = c.concept_id
AND cr.relationship_id = ‘ICDO to Proc Schema’ --similiar issue for ‘ICDO to Schema’
LEFT JOIN
concept c2
ON
c2.concept_id = cr.concept_id_2
WHERE
c.vocabulary_id = ‘ICDO3’
AND c2.concept_code IS NULL OR c2.concept_code = ‘All Other Sites’;

I slightly modified your query, look:
SELECT
c.concept_id ,
c.concept_code,
c.vocabulary_id,
cr.relationship_id,
c2.concept_id concept_id2,
c2.concept_code AS concept_code2,
c2.vocabulary_id AS vocabulary_id2
FROM concept c
LEFT JOIN concept_relationship cr ON cr.concept_id_1 = c.concept_id AND cr.relationship_id = ‘ICDO to Schema’ --similiar issue for ‘ICDO to Schema’
LEFT JOIN concept c2 ON c2.concept_id = cr.concept_id_2
WHERE c.vocabulary_id = ‘ICDO3’ AND ( c2.concept_code IS NULL OR c2.concept_code = ‘All Other Sites’)
and c.concept_code like ‘%/-C__.’ – 9137/3-C49.0-like pattern
;
and it gives only 26 concepts.
and c.concept_code like ‘%/-C__. - this is important filter. Schemas are connected to ICDO3 conditions defined by both histology and topography, and when a topography is defined only by 2 digit code, this is a classificator code, we can’t normally meet in a patient data.
If you’re building the cohort of breast cancer patients, better pick all descendants of SNOMED concept Primary malignant neoplasm of breast.
Thus you’ll capture not only the ICDO3 concepts, but SNOMED concepts differents source concepts are mapped to.

‘ICDO to Proc Schema’ relationship indeed misses a lot of ICDO3 concepts, but it shouldn’t be a problem unless you’re converting the NAACCR data.

Thanks, querying on the top level SNOMED concept is a good suggestion, although for the oncology extension, the tumor attributes such as stage, TNM, grade are only linked to the ICDO3 concepts via the condition_occurrence_id.

Also, if my understanding is correct, the schema name is needed to populate MEASUREMENT.value_source_value, for example, gallbladder@3605@3B would indicate gallbladder cancer with pathological stage IIIB. Is this how others are populating the stage?

The SNOMED query only yields about 1148 of the 2289 ICDO3 concepts for breast cancer.

SELECT DISTINCT
c.concept_id ,
c.concept_name,
c.vocabulary_id,
cr.relationship_id,
c2.concept_id concept_id2,
c2.concept_code AS concept_code2,
c2.vocabulary_id AS vocabulary_id2
FROM
concept c
LEFT JOIN
concept_relationship cr
ON
cr.concept_id_1 = c.concept_id
AND cr.relationship_id = ‘Finding site of’
LEFT JOIN
concept c2
ON
c2.concept_id = cr.concept_id_2
WHERE
c.concept_name = ‘Breast structure’
AND c.vocabulary_id=‘SNOMED’
AND c2.vocabulary_id = ‘ICDO3’;

versus

select count(distinct concept_code)
from concept
where vocabulary_id=‘ICDO3’ and concept_code like ‘%C50%’;

They can be connected to any condition_occurrence entry represented by SNOMED or ICDO3 concept.

Are you converting NAACCR data or are you trying to precoordinate some other ontologies into a such term?

When I was talking about the SNOMED query, I meant concept_ancestor usage.

Or probably your query is another case, right?

Only standard ICDO concepts have SNOMED attributes, Non-standard are just mapped to SNOMED. We don’t need attributes in non-standard concepts as they aren’t used in a data.
Also, the attribute is not only the ‘Breast structure’ but its descendants.
So, this will be more correct comparison:

select count(distinct concept_code)
from concept
where vocabulary_id=‘ICDO3’ and concept_code like ‘%C50%’
and standard_concept=‘S’
;
–versus
SELECT DISTINCT
/*
c.concept_id ,
c.concept_name,
c.vocabulary_id,
cr.relationship_id,
*/
c2.concept_id concept_id2, c2.standard_concept,
c2.concept_code AS concept_code2,
c2.vocabulary_id AS vocabulary_id2
FROM concept c
join concept_ancestor an ON an.ancestor_concept_id = c.concept_id
JOIN concept_relationship cr ON cr.concept_id_1 = an.descendant_concept_id AND cr.relationship_id = ‘Finding site of’
JOIN concept c2 on cr.concept_id_2 = c2.concept_id
WHERE c.concept_name = ‘Breast structure’ AND c.vocabulary_id=‘SNOMED’ AND c2.vocabulary_id = ‘ICDO3’

The counts look great, thanks so much Dmytry.

Yes, the tumor attributes can be connected to other SNOMED/ICDO3 concepts but I’m only connecting to the rows in the condition_occurrence which actually represent the real start date of the cancer. The other rows may come from the billing tables where diagnosis codes are attached to every event.

No, I’m not converting NAACCR data but am converting EMR data from the Epic Beacon module and following the example from an OHDSI tutorial.

34%20PM

If the real start date of cancer is assosiated with some ICDO that is mapped to SNOMED, you end up connecting cancer modifiers with SNOMED concepts (to be precise - with CONDITION_OCCURRENCE with condition_concept_id filled by SNOMED concept).

The Oncology vocabulary is under development, so it might be confusing.
In the case of gallbladder@3605@3B You have to populate the MEASUREMENT.concept_id with
Derived SEER Pathological Stage 3B
and gallbladder cancer is reflected in CONDITION_OCCURRENCE.condition_concept_id.

I see your interest in OMOPing Oncology, so I encourage you to participate in the Oncology vocabulary and CDM working group.
The gallbladder@3605@3B-like NAACCR values should be mapped to Cancer modifier vocabulary, I’m refferring above. This is one of the WG’s topic.
See working groups info here
OMOP CDM Oncology WG – CDM/Vocabulary Subgroup Meeting: Thursday, April 15 at 1 pm ET: (Meeting Link)

1 Like
t