OHDSI Home | Forums | Wiki | Github

International Classification of Diseases for Oncology (ICD-O)

@mgurley:

Where is that thing? Not in the SNOMED distribution file SnomedCT_InternationalRF2_Production_20170131T120000.zip.

You tossed them into the spreadsheet, which tries to be smart and recognizes the ICD-O histology codes as a date (the year 8001 etc.). I am staying away from Excel and it’s derivative these days because it always trying to do that (usually create scientific notations of long identifiers) and I don’t notice until much later.

Absolutely. Let’s bring it up with Jim Case. From time to time we are submitting bad concepts or relationships to a website he is running, so he knows us.

Interesting. We should definitely talk to them.

That would be good.

@Christian_Reich
The data for the ‘curr_simplemapprefset_f’ table is located in
‘Full/Refset/Map/der2_sRefset_SimpleMapFull_US1000124_20170301.txt’ file within the unzipped SNOMED download file. Open this file and look for some ICD-O-3 codes and you find the gold.

I used this project as my DDL compiler/data loader: https://github.com/rorydavidson/SNOMED-CT-Database. If you look at the scripts to load data from this project you will see the above referenced file being loaded into ‘curr_simplemapprefset_f’ table. Not sure how official the table names from this project are, I am a SNOMED newbie.

As for the spreadsheet, it is a CSV format that you should be able to download from Google Drive and open with a clean text editor to inspect undisturbed by any “smart” coercions. I only see the date coercion upon preview in Google Drive. I guess Google is not perfect.

I would be happy to propose my "great’ ideas to Jim. Joking aside, I think the world could use a pre-coordinated ICD-O-3 site/histology paring. Would make the job of people fitting ICD-O-3 data into data models like the OHDSI/OMOP CDM much easier.

@mgurley:

Great. You are making life so much easier. I found the file.

Come to think of: This is non-trivial, because they mustn’t create these pairs if there already is an existing precoordinated concept from the 2 axes. If the links from all SNOMED concept to their two dimensions were reliable it would work well. But I have my doubts. We’ll look into this.

@Christian_Reich Looks like the CSV download from the WHO website that I used to map to SNOMED is for ICD-O-3, not ICD-O-3.1. Does not look like WHO provides a CSV download of ICD-O-3.1. Very annoying. WHO does provide a document that spells out in Appendix 7 the differences between ICD-O-3 and ICD-O-3.1. See here:

http://apps.who.int/iris/bitstream/10665/96612/1/9789241548496_eng.pdf

It looks like from my querying of my local copy of the NCI Metathesaurus that it contains ICD-O-3.1. So I will recalculate the ICD-O-3.1 to SNOMED map counts using ICD-0-3.1 from the NCI Metathesaurus.

@mgurley:

Oh man. Sounds like fun. The WHO is notorious for not providing proper digital distribution files, but this PDF nonsense instead. Same thing with ATC, same thing with ICD-10. Thank God for UMLS and NCI, here.

@Christian_Reich @rimma @ihuerga

Using histologies for ICD-O-3.1 from the NCI Metathesaurus, the number of histologies unmapped to SNOMED is now none. There is one inactive ICD-O-3.1 code that can’t be mapped: 8240/1 Updated the CSV files:

https://drive.google.com/open?id=0Bzcc8twUxevJT2dNMm16eTJwZUk
https://drive.google.com/open?id=0Bzcc8twUxevJbGdzYTVNSS1Wa00

Whether or not retired ICD-O-3 or even ICD-O-2 codes should be mapped or not would set a higher bar.

@Christian_Reich @rimma @Vojtech_Huser @ihuerga

I have learned a little more about SNOMED CT. My previous ICD-O-3.1 to SNOMED mapping efforts were not taking into account expired mappings. @Vojtech_Huser helped me figure that out. I will post the results of my direct mappings later this week. It looks like there are two possible SNOMED CT maps/refsets to use for histology within SNOMED CT and one for sites. The results will be different from what I previously reported.

Also, I discovered that topography and morphology are under the larger SNOMED CT Axis ‘Body Structure (body structure)’ and that SNOMED CT does indeed precoordinate topography/morphology or site/histology parings into SNOMED codes in the sub-axis "Disease (disorder) under the ‘Clinical Finding’ top-level SNOMED CT axis. Which makes sense to me. I will post results later this week on how well the SNOMED CT precoordinations cover the SEER site/histology validation list parings. You can see the precoordination in this example of ‘Anaplastic astrocytoma of brain (disorder)’ in the SNOMED CT browser (Click the Diagram tab):

http://browser.ihtsdotools.org/?perspective=full&conceptId1=277461004&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

@Christian_Reich @rimma

Here are my latest ICD-O-3 to SNOMED mapping results:

I found that SNOMED has 3 axes of interest

  1. ​Morphologically abnormal structure (morphologic abnormality) axis. Which is mapped via a SNOMED refset to ICD-O-3 morphology codes.
  2. Anatomical structure (body structure) axis. Which is mapped via a SNOMED refset to a ICD-O-3 site code.
  3. Disorder axis (which can be a pre-coordinated combination of a ‘Finding Site’ attribute relationship and an ‘Associated Morphology’ attribute relationship).

Which is exactly what we are looking for, I believe. To see this visually, do the following:

  1. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=8551000119100&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

  2. Click ‘Diagram’ tab

  3. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=7712004&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

  4. Click the ‘Refsets’ tab

  5. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=3898006&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

6). Click the ‘Refsets’ tab

So far I have:

  1. Created the list of combinations of ICD-O-3 site/histology via the SEER site/histology validation list.

https://drive.google.com/open?id=0Bzcc8twUxevJVUxBaGFnanozcTA

This represents 49,831 site/histology combinations. Though this is only covering SEER reportable combinations. Benign non-primary CNS neoplasms are not covered.

  1. Mapped each of the ICD-O-3 Site/Morphology axes to SNOMED codes via the SNOMED refsets. For the histology axis, there are two possible refsets:

ICD-O simple map reference set (foundation metadata concept) 446608001

Histologies:
– 13 unmapped
– 854 mapped to one
– 198 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJNnJMMzItN05lVlU

Sites:
– 43 unmapped
– 4 mapped to one
– 283 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJMGR1Qml1cElqaG8

CTV3 simple map reference set (foundation metadata concept) 900000000000498005

Histologies:
– 5 unmapped
– 1060 mapped to one
– 0 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJSm1LVDlZaU5xNGc

  1. Found all the SNOMED disorders that have a pre-coordinated relationship via the ‘Finding Site’ attribute relationship and an ‘Associated Morphology’ attribute relationship.

69,824 disorders
https://drive.google.com/open?id=0Bzcc8twUxevJMmxXTndzT2d3aUk

  1. Found all the matching combination SEER site/histology pairings mapped to SNOMED codes to pre-coordinated SNOMED disorder codes

https://drive.google.com/open?id=0Bzcc8twUxevJMHV4dWpvTmR6SnM

1,924 mappings from a ICD-O-3 site/histology parings mapped to SNOMED codes
973 distinct ICD-O-3 site/histology site/histology pairings

46 mapped to one
927 mapped to more than one

So the final upshot is 973 out of 49,831 parings can be mapped or 2%. And 927 out of the 973 can be mapped to more than one. Not a very impressive result.

Here is some of the SQL I used for anyone interested:

SELECT distinct d.conceptid
, r.destinationid AS histology_destinationid
, r2.destinationid AS site_destinationid
FROM curr_description_f d
join curr_relationship_f r on d.conceptid = r.sourceid and r.active = ‘1’ and r.typeid = ‘116676008’ – “Associated morphology (attribute)”
join curr_relationship_f r2 on d.conceptid = r2.sourceid and r2.active = ‘1’ and r2.typeid = ‘363698007’ and r.relationshipgroup = r2.relationshipgroup – “Finding site (attribute)”
where d.typeid = ‘900000000000003001’
and d.active = ‘1’
–and r.destinationid = ‘21964009’
–and r2.destinationid = ‘57171008’
–and d.conceptid = ‘188502002’
and not exists(
select 1
from curr_relationship_f r3
where r.moduleid = r3.moduleid
and r.sourceid = r3.sourceid
–and r.destinationid = r3.destinationid
and r.relationshipgroup = r3.relationshipgroup
and r.typeid = r3.typeid
and r.characteristictypeid = r3.characteristictypeid
and r.modifierid = r3.modifierid
–and r3.active = ‘0’
and r3.effectivetime > r.effectivetime
)
and not exists(
select 1
from curr_relationship_f r4
where r2.moduleid = r4.moduleid
and r2.sourceid = r4.sourceid
–and r2.destinationid = r4.destinationid
and r2.relationshipgroup = r4.relationshipgroup
and r2.typeid = r4.typeid
and r2.characteristictypeid = r4.characteristictypeid
and r2.modifierid = r4.modifierid
–and r4.active = ‘0’
and r4.effectivetime > r2.effectivetime
)
order by d.conceptid

SELECT curr_simplemaprefset_f.*
FROM curr_simplemaprefset_f
WHERE curr_simplemaprefset_f.refsetid = ‘?’
AND curr_simplemaprefset_f.maptarget = ‘?’
AND curr_simplemaprefset_f.active = ‘1’
AND (NOT EXISTS
(SELECT 1
FROM curr_simplemaprefset_f AS snomed_maps
WHERE snomed_maps.moduleid = curr_simplemaprefset_f.moduleid
AND snomed_maps.refsetid = curr_simplemaprefset_f.refsetid
AND snomed_maps.referencedcomponentid = curr_simplemaprefset_f.referencedcomponentid
AND snomed_maps.maptarget = curr_simplemaprefset_f.maptarget
AND snomed_maps.effectivetime > curr_simplemaprefset_f.effectivetime
AND snomed_maps.active = ‘0’))

@rimma @Christian_Reich

I submitted the following question to SEER about the SEER Site/Histology validation list:

I have a question abut the ICD-O-3 SEER Site/Histology Validation List available on the following page:

https://seer.cancer.gov/icd-o-3/

There is a disclaimer on the page that states:

"The Site/Histology List is not intended to be used for case finding or to determine reportability."

If it is not intended for either of these uses, what use is it intended for?

Also, how are the pairings published in this list established?  Are they manually curated by Pathologists?  Or based on what has been ever reported?  Or something else?

I received the following response:

The site/histology list is based on ICD-O-3 and WHO Classifications of Tumors (aka: 
blue books).

The site/histology validation list is used in registry data collection as well as editing 
software.

Some site/histologies combinations are common and do not require additional review. 

There are some combinations that are not common which need to be verified before 
they can be added to the database.

There are also impossible combinations that will not clear edits even with review and 
cannot be overridden.

The site/histology validation list is updated when needed to reflect changes in ICD-O-
3, WHO updates or reported cases. ICD-9CM and ICD-10CM codes are used for 
casefinding and reportable tumors are found in either the SEER Program and Staging 
Manuals or FORDS manuals. Reportable tumors differ by diagnosis year.

@rimma @Christian_Reich

Here is the response I received from SNOMED regarding mapping ICD-O-3 axes to SNOMED codes and ICD-O-3 site/histology parings to precoordinated SNOMED clinical findings disorder/diseases. The questions was basically a rehash of my prior forum post.

Thank you for your mail with the subject "Mapping ICD-O-3 Site/Histology Parings to pre-coordinated SNOMED codes", within the SNOMED International customer support system.

First, we would like to explain the approach for the mappings between SNOMED CT and ICD-O. 

All morphological codes in ICD-O version 3.1 publication have been included as morphologic abnormalities in SNOMED CT.
The directional maps from SNOMED CT to ICD-O are provided in the Simple Map refset.
They are one-to-one or many-to-one maps due to the granularity differences between two systems.
For example, some are synonyms in ICD-O codes but they are subconcepts in SNOMED CT.
Furthermore, there are also morphologies in SNOMED CT that are not covered by ICD-O codes.

All topographical codes in ICD-O version 3.1 publication have been used as target of mapping for body structure concepts in SNOMED CT.
The topographical codes from ICD-O are not sufficient for SNOMED CT concept modeling for disorders.
Currently, 22,826 body structure concepts are mapped to 287 ICD-O codes.
These maps are incomplete because there are potentially additional 3,000 body structures to be mapped to ICD-O codes.

Please let us know if we have missed any ICD-O codes in version 3.1. We have also planned to include ICD-O version 3.2 when we receive the finalised release from WHO. 

We haven't taken the approach to pre-coordinate disorders in SNOMED CT by combination of ICD-O morphologies and topographical codes.
The combination can be fully supported by post-coordination based on SNOMED CT concept model.
Currently, disorders are added when we receive customer requests.
These are the key reasons that disorders do not have full coverage for all ICD-O combinations.   

Regarding what have been covered by disorders, 4,481 disorders have been modeled by morphologies mapped to ICD-O codes with SNOMED CT body structures that are subconcepts of ICD-O topographical codes.
If we ignore body structures, there are about 5,000 disorders that are modelled by ICD-O morphologies.
It is about 10% of your estimated possible combinations by ICD-O sites and morphologies. 

We would like understand your requirement for pre-coordination for all combinations. 
In particular, the explanations for reasons and applications from professional bodies and user groups would help us to prioritise the requirement for future content authoring.

@rimma @Christian_Reich

I received some more responses from SNOMED.

  • ​Confirmed what SNOMED “refset” to use for mapping histology/site: ICD-O simple map reference set (foundation metadata concept) 446608001

  • Confirmed SNOMED is actually mapping to ICD-O-3.2 via ICD-11. ICD-O-3.2 is an unpublished update to ICD-O-3.1. ICD-11 is still in draft status as well.

  • Confirmed that SNOMED is more fine-grained than ICD-O-3. On both the histology and site axes Hence the prevalence of ICD-O-3 sites/histologies mapped to multiple SNOMED codes.

  • SNOMED’s explanation of the mismatch:
    – Some terms are treated as synonyms in ICD-O, which are distinct subconcepts in SNOMED CT.
    – SNOMED includes morphologies that are not currently in the ICD-O to support clinical needs.
    – There are significant details in representation of body structures in SNOMED CT which cannot be found in ICD-O.
    – SNOMED CT content are updated and released twice a year.

So the current state of the axis to axis mapping is the following:

Histologies:
– 13 unmapped (mostly retired codes in ICD-O-3.2)
– 854 mapped to one
– 198 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJNnJMMzItN05lVlU

Sites:
– 43 unmapped (pending mapping to be done in SNOMED)
– 4 mapped to one
– 283 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJMGR1Qml1cElqaG8

Here is a Github repository that contains the code I have written to perform mappings from ICD-O to SNOMED:

I would especially like some help from anybody who can help tackle querying SNOMED to determine the most appropriate code to pick when and ICD-O code maps to multiple SNOMED codes.

Great.
We’ll try to go through this next week.

t