We have been working on mapping SEER data. It is a bit challenging for many reasons. I will post what we have done with oncology data to date in a tab delimited file on dropbox for now: https://www.dropbox.com/s/2gl5az5duq7z72q/Final%20SEER%20variable%20with%20codes.txt?dl=0
The main challenge is that the oncology vocabularies combine concepts together. For example histology codes (e.g., the codes for small cell lung cancer) also include behavior (e.g., “malignant”). You can separate them, but you then can’t use the text descriptors from ICD-O which are specific to clinically relevant combinations of histology and behavior. Another issue is that there are “recoded” values from SEER that are useful for grouping things into commonly used cancers. Generally this is by site, but non-solid tumors need to be handled differently.
LOINC is very good for most oncology vocabulary work. It needs to be updated in a few places but it works. I would encourage us to use that for everything rather than mixing vocabularies. At least, it would be good to do it in LOINC as much as possible.
In the end, I think we are going to need a separate “oncology” vocabulary.