OHDSI Home | Forums | Wiki | Github

International Classification of Diseases for Oncology (ICD-O)

Sorry for the naïve question, but I am having trouble understanding where to draw the line of where things go.

For cardiology, we have heart failure under condition, but the ejection fraction, which quantifies the heart failure, is in measurement.

For ID, we have infection under condition, and often the organism, but only rarely the sensitivity (other than methicillin resistant, etc.). Sensitivity usually goes with the micro data.

If the histology worsens over time, or if the metastatic location grows over time, is that analogous to ejection fraction?

George

That’s in the NCI Metathesaurus?

And then there is:
3. Traverse a similar relationship within SNOMED.

We should evaluate all 3.

Understood. And SNOMED works that way too. We need to figure out where the former takes over, where the latter is not detailed enough. Which is what you are saying below.

Will do.

Agreed.

This is a big subject for a new Forum posting, Iker. Called “Pandora’s Box” :smile: But yes, we will have to go there and debate. Probably for a face-to-face meeting with a couple of folks who are interested in this.

Ejection fraction is a way to diagnose, and to establish the severity of congestive hear failure. But the disease is still fully characterized by calling it “congestive heart failure”. In oncology, it is different. The histology/morphology is a disease characterization itself. E.g. for skin cancer, it makes a big difference whether it is derived from the basal cells or the melanocytes. It affects progression, prognosis and treatment. The anatomical site is also a characterizing factor, because it will also define progression, prognosis and treatment. And finally, so is the stage. These are not observations. They are part of the Condition and causal to everything that follows. We are trying to figure out how to incorporate the level of details that is necessary to describe these Conditions properly.

Infections are characterized by cause (which bug), morphology (granulomatous or ordinary purulent), and anatomical site. But SNOMED gives us all we need there in terms of pre-coordinated concepts. Sensitivity, like ejection fraction, are measurements. So is genetic abberations in cancers (though that is increasingly becoming part of the Condition definition as well).

…then the Condition changes. It’s no longer the same problem. Treatment options change with it.

My 2 cents.

1 Like

Hello, I created a script to create pairings of site/histology from the SEER site/histology validation list. Here are the results in CSV format:

https://drive.google.com/file/d/0Bzcc8twUxevJVUxBaGFnanozcTA/view?usp=sharing

The SEER site/histology validation list is published here:

https://seer.cancer.gov/icd-o-3/.

It comes to 49,831 pairings. Looks like this would be the minimum number of pairings needing to be mapped to a SNOMED code. However, looking at the results, the pairings appear to be restricted to ‘reportable’ parings. As ‘8000/0’ is only paired with Primary CNS sites because I believe only benign primary CNS tumors are reportable. So the number of parings will increase if benign tumors are expanded beyond primary CNS sites.

I have the NCI Metathesaurus installed locally. Does any body have the SQL already crafted to be able to take a site/histology pairing to query the NCI Metathesaurus table structures (which seem to mirror the UMLS) to yield a SNOMED code?

1 Like

@mgurley:

This is great. From an ICD-O-3 perspective, this is the total space we need to cover. We now need to map it to SNOMED. We also need to map it to ICD10, to make sure their mapping to SNOMED doesn’t lead us awry, and the cycle closes nicely.

I haven’t installed the NCI Metathesaurus, yet. The UMLS doesn’t have the links to ICD-O-3, only between SNOMED and the NCI code. @ihuerga has that piece, and he is also the one who pulled the inital mapping for us.

@rimma @Christian_Reich @ihuerga
I did a little research on SNOMED and ICD-O-3. As Christian mentioned SNOMED, like ICD-O-3 is multi-axial. Two of the top-level SNOMED axes are Topography and Morphology. SNOMED already has a mapping to all the codes in both ICD-O-3 axes. I downloaded the latest version of SNOMED and ICD-O-3 and put them in the same database. Then queried the SNOMED mapping table, ‘curr_simplemaprefset_f’, for how many times each ICD-O-3 code is mapped to a SNOMED code. Here are the results:

https://drive.google.com/open?id=0Bzcc8twUxevJT2dNMm16eTJwZUk
https://drive.google.com/open?id=0Bzcc8twUxevJbGdzYTVNSS1Wa00

All ICD-O-3 codes are covered, but some times more than once. In other words, some ICD-O-3 codes map to multiple SNOMED codes. It appears there is no map between ICD-O-3 site/histology parings and SNOMED codes, but perhaps we could ask SNOMED to create them. Starting with combinations based on the SEER site/histology validation list. Would need to look elsewhere for the same parings for non-CNS sites and benign histologies.
There is a SNOMED confluence project page describing SNOMED work related to ICD-O-3. See here:

https://confluence.ihtsdotools.org/display/CDR/ICD-O

It might we worth reaching out to this group to get their thoughts. Also, SNOMED seems to be flexible about tilting toward both pre and post coordination. SNOMED has a process for submitting request changes. I would advocate cutting out the middle man of the NCI Thesaurus/Metathesaurus and ask SNOMED to create a coordinated pairing of each valid Site/Histology paring. While we are waiting for this, I would simply stick with @rimma original draft version of creating an OMOP/OHDSI place holder vocabulary in the interim.

I have a file that contains all the ICD-O-3 codes to SNOMED codes as well but sharing it might not be compatible with the SNOMED license. But can share it with those interested.

@rimma @Christian_Reich @ihuerga
I made some mistakes in my SQL to calculate the counts of ICD-O-3 codes to SNOMED codes. I replaced the CSV files with new ones:

https://drive.google.com/open?id=0Bzcc8twUxevJT2dNMm16eTJwZUk
https://drive.google.com/open?id=0Bzcc8twUxevJbGdzYTVNSS1Wa00

The correction in my SQL revealed some ICD-O-3 codes that have no mapping to a SNOMED code. In the CSV files, these are the entires with a snomed_code_map_count = 0. After the correction there are 6 histology ICD-O-3 codes without a mapping to a SNOMED code. And 111 site ICD-O-3 codes without a mapping to a SNOMED code. However, 70 of the un-mapable ICD-O-3 site codes are all the top level 3 character site categories, like ‘C00’=‘lip’ or ‘C09’=‘tonsil’. I don’t believe these higher level categorization codes are used in the wild, so the problem boils down to 41 bottom level ICD-O-3 5 character site codes not mappable to SNOMED code.

One other thing I need to verify is which version of ICD-O-3 is available from the WHO site for download. As that is what I used. From here:

http://apps.who.int/classifications/apps/icd/ClassificationDownload/DLArea/Download.aspx

If anybody wants an updated ICD-O-3 to SNOMED code mapping file, let me know.

Also, if anybody has a contact at SNOMED to help discuss the current state/future plans of the SNOMED to ICD-O-3 code mapping, please forward me their contact information.

@mgurley:

Where is that thing? Not in the SNOMED distribution file SnomedCT_InternationalRF2_Production_20170131T120000.zip.

You tossed them into the spreadsheet, which tries to be smart and recognizes the ICD-O histology codes as a date (the year 8001 etc.). I am staying away from Excel and it’s derivative these days because it always trying to do that (usually create scientific notations of long identifiers) and I don’t notice until much later.

Absolutely. Let’s bring it up with Jim Case. From time to time we are submitting bad concepts or relationships to a website he is running, so he knows us.

Interesting. We should definitely talk to them.

That would be good.

@Christian_Reich
The data for the ‘curr_simplemapprefset_f’ table is located in
‘Full/Refset/Map/der2_sRefset_SimpleMapFull_US1000124_20170301.txt’ file within the unzipped SNOMED download file. Open this file and look for some ICD-O-3 codes and you find the gold.

I used this project as my DDL compiler/data loader: https://github.com/rorydavidson/SNOMED-CT-Database. If you look at the scripts to load data from this project you will see the above referenced file being loaded into ‘curr_simplemapprefset_f’ table. Not sure how official the table names from this project are, I am a SNOMED newbie.

As for the spreadsheet, it is a CSV format that you should be able to download from Google Drive and open with a clean text editor to inspect undisturbed by any “smart” coercions. I only see the date coercion upon preview in Google Drive. I guess Google is not perfect.

I would be happy to propose my "great’ ideas to Jim. Joking aside, I think the world could use a pre-coordinated ICD-O-3 site/histology paring. Would make the job of people fitting ICD-O-3 data into data models like the OHDSI/OMOP CDM much easier.

@mgurley:

Great. You are making life so much easier. I found the file.

Come to think of: This is non-trivial, because they mustn’t create these pairs if there already is an existing precoordinated concept from the 2 axes. If the links from all SNOMED concept to their two dimensions were reliable it would work well. But I have my doubts. We’ll look into this.

@Christian_Reich Looks like the CSV download from the WHO website that I used to map to SNOMED is for ICD-O-3, not ICD-O-3.1. Does not look like WHO provides a CSV download of ICD-O-3.1. Very annoying. WHO does provide a document that spells out in Appendix 7 the differences between ICD-O-3 and ICD-O-3.1. See here:

http://apps.who.int/iris/bitstream/10665/96612/1/9789241548496_eng.pdf

It looks like from my querying of my local copy of the NCI Metathesaurus that it contains ICD-O-3.1. So I will recalculate the ICD-O-3.1 to SNOMED map counts using ICD-0-3.1 from the NCI Metathesaurus.

@mgurley:

Oh man. Sounds like fun. The WHO is notorious for not providing proper digital distribution files, but this PDF nonsense instead. Same thing with ATC, same thing with ICD-10. Thank God for UMLS and NCI, here.

@Christian_Reich @rimma @ihuerga

Using histologies for ICD-O-3.1 from the NCI Metathesaurus, the number of histologies unmapped to SNOMED is now none. There is one inactive ICD-O-3.1 code that can’t be mapped: 8240/1 Updated the CSV files:

https://drive.google.com/open?id=0Bzcc8twUxevJT2dNMm16eTJwZUk
https://drive.google.com/open?id=0Bzcc8twUxevJbGdzYTVNSS1Wa00

Whether or not retired ICD-O-3 or even ICD-O-2 codes should be mapped or not would set a higher bar.

@Christian_Reich @rimma @Vojtech_Huser @ihuerga

I have learned a little more about SNOMED CT. My previous ICD-O-3.1 to SNOMED mapping efforts were not taking into account expired mappings. @Vojtech_Huser helped me figure that out. I will post the results of my direct mappings later this week. It looks like there are two possible SNOMED CT maps/refsets to use for histology within SNOMED CT and one for sites. The results will be different from what I previously reported.

Also, I discovered that topography and morphology are under the larger SNOMED CT Axis ‘Body Structure (body structure)’ and that SNOMED CT does indeed precoordinate topography/morphology or site/histology parings into SNOMED codes in the sub-axis "Disease (disorder) under the ‘Clinical Finding’ top-level SNOMED CT axis. Which makes sense to me. I will post results later this week on how well the SNOMED CT precoordinations cover the SEER site/histology validation list parings. You can see the precoordination in this example of ‘Anaplastic astrocytoma of brain (disorder)’ in the SNOMED CT browser (Click the Diagram tab):

http://browser.ihtsdotools.org/?perspective=full&conceptId1=277461004&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

@Christian_Reich @rimma

Here are my latest ICD-O-3 to SNOMED mapping results:

I found that SNOMED has 3 axes of interest

  1. ​Morphologically abnormal structure (morphologic abnormality) axis. Which is mapped via a SNOMED refset to ICD-O-3 morphology codes.
  2. Anatomical structure (body structure) axis. Which is mapped via a SNOMED refset to a ICD-O-3 site code.
  3. Disorder axis (which can be a pre-coordinated combination of a ‘Finding Site’ attribute relationship and an ‘Associated Morphology’ attribute relationship).

Which is exactly what we are looking for, I believe. To see this visually, do the following:

  1. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=8551000119100&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

  2. Click ‘Diagram’ tab

  3. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=7712004&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

  4. Click the ‘Refsets’ tab

  5. Go to http://browser.ihtsdotools.org/?perspective=full&conceptId1=3898006&edition=us-edition&release=v20170301&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007

6). Click the ‘Refsets’ tab

So far I have:

  1. Created the list of combinations of ICD-O-3 site/histology via the SEER site/histology validation list.

https://drive.google.com/open?id=0Bzcc8twUxevJVUxBaGFnanozcTA

This represents 49,831 site/histology combinations. Though this is only covering SEER reportable combinations. Benign non-primary CNS neoplasms are not covered.

  1. Mapped each of the ICD-O-3 Site/Morphology axes to SNOMED codes via the SNOMED refsets. For the histology axis, there are two possible refsets:

ICD-O simple map reference set (foundation metadata concept) 446608001

Histologies:
– 13 unmapped
– 854 mapped to one
– 198 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJNnJMMzItN05lVlU

Sites:
– 43 unmapped
– 4 mapped to one
– 283 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJMGR1Qml1cElqaG8

CTV3 simple map reference set (foundation metadata concept) 900000000000498005

Histologies:
– 5 unmapped
– 1060 mapped to one
– 0 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJSm1LVDlZaU5xNGc

  1. Found all the SNOMED disorders that have a pre-coordinated relationship via the ‘Finding Site’ attribute relationship and an ‘Associated Morphology’ attribute relationship.

69,824 disorders
https://drive.google.com/open?id=0Bzcc8twUxevJMmxXTndzT2d3aUk

  1. Found all the matching combination SEER site/histology pairings mapped to SNOMED codes to pre-coordinated SNOMED disorder codes

https://drive.google.com/open?id=0Bzcc8twUxevJMHV4dWpvTmR6SnM

1,924 mappings from a ICD-O-3 site/histology parings mapped to SNOMED codes
973 distinct ICD-O-3 site/histology site/histology pairings

46 mapped to one
927 mapped to more than one

So the final upshot is 973 out of 49,831 parings can be mapped or 2%. And 927 out of the 973 can be mapped to more than one. Not a very impressive result.

Here is some of the SQL I used for anyone interested:

SELECT distinct d.conceptid
, r.destinationid AS histology_destinationid
, r2.destinationid AS site_destinationid
FROM curr_description_f d
join curr_relationship_f r on d.conceptid = r.sourceid and r.active = ‘1’ and r.typeid = ‘116676008’ – “Associated morphology (attribute)”
join curr_relationship_f r2 on d.conceptid = r2.sourceid and r2.active = ‘1’ and r2.typeid = ‘363698007’ and r.relationshipgroup = r2.relationshipgroup – “Finding site (attribute)”
where d.typeid = ‘900000000000003001’
and d.active = ‘1’
–and r.destinationid = ‘21964009’
–and r2.destinationid = ‘57171008’
–and d.conceptid = ‘188502002’
and not exists(
select 1
from curr_relationship_f r3
where r.moduleid = r3.moduleid
and r.sourceid = r3.sourceid
–and r.destinationid = r3.destinationid
and r.relationshipgroup = r3.relationshipgroup
and r.typeid = r3.typeid
and r.characteristictypeid = r3.characteristictypeid
and r.modifierid = r3.modifierid
–and r3.active = ‘0’
and r3.effectivetime > r.effectivetime
)
and not exists(
select 1
from curr_relationship_f r4
where r2.moduleid = r4.moduleid
and r2.sourceid = r4.sourceid
–and r2.destinationid = r4.destinationid
and r2.relationshipgroup = r4.relationshipgroup
and r2.typeid = r4.typeid
and r2.characteristictypeid = r4.characteristictypeid
and r2.modifierid = r4.modifierid
–and r4.active = ‘0’
and r4.effectivetime > r2.effectivetime
)
order by d.conceptid

SELECT curr_simplemaprefset_f.*
FROM curr_simplemaprefset_f
WHERE curr_simplemaprefset_f.refsetid = ‘?’
AND curr_simplemaprefset_f.maptarget = ‘?’
AND curr_simplemaprefset_f.active = ‘1’
AND (NOT EXISTS
(SELECT 1
FROM curr_simplemaprefset_f AS snomed_maps
WHERE snomed_maps.moduleid = curr_simplemaprefset_f.moduleid
AND snomed_maps.refsetid = curr_simplemaprefset_f.refsetid
AND snomed_maps.referencedcomponentid = curr_simplemaprefset_f.referencedcomponentid
AND snomed_maps.maptarget = curr_simplemaprefset_f.maptarget
AND snomed_maps.effectivetime > curr_simplemaprefset_f.effectivetime
AND snomed_maps.active = ‘0’))

@rimma @Christian_Reich

I submitted the following question to SEER about the SEER Site/Histology validation list:

I have a question abut the ICD-O-3 SEER Site/Histology Validation List available on the following page:

https://seer.cancer.gov/icd-o-3/

There is a disclaimer on the page that states:

"The Site/Histology List is not intended to be used for case finding or to determine reportability."

If it is not intended for either of these uses, what use is it intended for?

Also, how are the pairings published in this list established?  Are they manually curated by Pathologists?  Or based on what has been ever reported?  Or something else?

I received the following response:

The site/histology list is based on ICD-O-3 and WHO Classifications of Tumors (aka: 
blue books).

The site/histology validation list is used in registry data collection as well as editing 
software.

Some site/histologies combinations are common and do not require additional review. 

There are some combinations that are not common which need to be verified before 
they can be added to the database.

There are also impossible combinations that will not clear edits even with review and 
cannot be overridden.

The site/histology validation list is updated when needed to reflect changes in ICD-O-
3, WHO updates or reported cases. ICD-9CM and ICD-10CM codes are used for 
casefinding and reportable tumors are found in either the SEER Program and Staging 
Manuals or FORDS manuals. Reportable tumors differ by diagnosis year.

@rimma @Christian_Reich

Here is the response I received from SNOMED regarding mapping ICD-O-3 axes to SNOMED codes and ICD-O-3 site/histology parings to precoordinated SNOMED clinical findings disorder/diseases. The questions was basically a rehash of my prior forum post.

Thank you for your mail with the subject "Mapping ICD-O-3 Site/Histology Parings to pre-coordinated SNOMED codes", within the SNOMED International customer support system.

First, we would like to explain the approach for the mappings between SNOMED CT and ICD-O. 

All morphological codes in ICD-O version 3.1 publication have been included as morphologic abnormalities in SNOMED CT.
The directional maps from SNOMED CT to ICD-O are provided in the Simple Map refset.
They are one-to-one or many-to-one maps due to the granularity differences between two systems.
For example, some are synonyms in ICD-O codes but they are subconcepts in SNOMED CT.
Furthermore, there are also morphologies in SNOMED CT that are not covered by ICD-O codes.

All topographical codes in ICD-O version 3.1 publication have been used as target of mapping for body structure concepts in SNOMED CT.
The topographical codes from ICD-O are not sufficient for SNOMED CT concept modeling for disorders.
Currently, 22,826 body structure concepts are mapped to 287 ICD-O codes.
These maps are incomplete because there are potentially additional 3,000 body structures to be mapped to ICD-O codes.

Please let us know if we have missed any ICD-O codes in version 3.1. We have also planned to include ICD-O version 3.2 when we receive the finalised release from WHO. 

We haven't taken the approach to pre-coordinate disorders in SNOMED CT by combination of ICD-O morphologies and topographical codes.
The combination can be fully supported by post-coordination based on SNOMED CT concept model.
Currently, disorders are added when we receive customer requests.
These are the key reasons that disorders do not have full coverage for all ICD-O combinations.   

Regarding what have been covered by disorders, 4,481 disorders have been modeled by morphologies mapped to ICD-O codes with SNOMED CT body structures that are subconcepts of ICD-O topographical codes.
If we ignore body structures, there are about 5,000 disorders that are modelled by ICD-O morphologies.
It is about 10% of your estimated possible combinations by ICD-O sites and morphologies. 

We would like understand your requirement for pre-coordination for all combinations. 
In particular, the explanations for reasons and applications from professional bodies and user groups would help us to prioritise the requirement for future content authoring.

@rimma @Christian_Reich

I received some more responses from SNOMED.

  • ​Confirmed what SNOMED “refset” to use for mapping histology/site: ICD-O simple map reference set (foundation metadata concept) 446608001

  • Confirmed SNOMED is actually mapping to ICD-O-3.2 via ICD-11. ICD-O-3.2 is an unpublished update to ICD-O-3.1. ICD-11 is still in draft status as well.

  • Confirmed that SNOMED is more fine-grained than ICD-O-3. On both the histology and site axes Hence the prevalence of ICD-O-3 sites/histologies mapped to multiple SNOMED codes.

  • SNOMED’s explanation of the mismatch:
    – Some terms are treated as synonyms in ICD-O, which are distinct subconcepts in SNOMED CT.
    – SNOMED includes morphologies that are not currently in the ICD-O to support clinical needs.
    – There are significant details in representation of body structures in SNOMED CT which cannot be found in ICD-O.
    – SNOMED CT content are updated and released twice a year.

So the current state of the axis to axis mapping is the following:

Histologies:
– 13 unmapped (mostly retired codes in ICD-O-3.2)
– 854 mapped to one
– 198 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJNnJMMzItN05lVlU

Sites:
– 43 unmapped (pending mapping to be done in SNOMED)
– 4 mapped to one
– 283 mapped to more than one

https://drive.google.com/open?id=0Bzcc8twUxevJMGR1Qml1cElqaG8

Here is a Github repository that contains the code I have written to perform mappings from ICD-O to SNOMED:

I would especially like some help from anybody who can help tackle querying SNOMED to determine the most appropriate code to pick when and ICD-O code maps to multiple SNOMED codes.

Great.
We’ll try to go through this next week.

t