OHDSI Home | Forums | Wiki | Github

Potentially missing NAACCR value


#1

Hello, I’ve been unable to find in either Athena or by SQL query the NAACCR value corresponding to concept_code = 1296@03-89, in response to NAACCR variable of concept_id = 35918370. Notice the absence of a response value for ‘Three or more regional lymph nodes removed’ under the parent variable’s relations.

Am I correct in assuming this concept is missing? If so, is this best dealt with by creating an issue in the OHDSI/CommonDataModel project? And lastly does anyone know what the timeline is generally like for small issues like this to be resolved in patch versions?


(Dmytry Dymshyts) #2

Hi @cliffroared
Thanks for noticing.
Somehow we don’t have this value in our data source. Can you please provide the source you are using?
You can address issues to the Oncology workgroup repo:
https://github.com/OHDSI/OncologyWG/issues


#3

The description of values greater than 2 and less than 90 can be inferred by the entries in the column’s description from the NAACCR data dictionary.

Thank you for pointing me to the right project to reach out to.


(Dmytry Dymshyts) #4

ah, I see. We used SEER API, which also has these ugly

02 Two regional lymph nodes examined
90 90 or more regional lymph nodes examined

So, there’s no straightforward way to determine 03-89 range.
Actually we’ll add the checks for “…” in concept codes


#5

While this problem is being tackled, I looked at some other entries which use ellipses in the data dictionary (which I am guessing has some close similarities to the SEER API, seeing as the presence of ellipses correlates so closely with slight OMOP concept mistakes).

  • Concept 35918591 is similarly missing a response for values 03-89
  • Concept 35918412 does have a value corresponding to the range from 03-59, but the concept code is incorrectly set as 560@..
  • Concept 35918783 seems to have entirely incorrect responses, as none have the correct name, the ‘Unknown age’ option is missing, and the final bucket is incorrectly from 000-120

Lastly, are concept codes supposed to be zero padded in suffix length to ensure proper sorting? E.g. 123@000, 123@001, …, 123@456, instead of 123@1, 123@2, etc.?
I have seen other concepts with single digit suffixes set to invalid and replaced by valid concepts that are identical in all fields except for the addition of proper zero padding, but I wasn’t sure if this is intended to be practiced throughout the concept table as a whole. If this is the practice, then all three of the above concepts (35918591, 35918412, 35918783) and the original problem concept (35918370) are all incorrectly zero padded among their responses.


(Rimma Belenkaya) #6

Hello @cliffroared,

It looks like you are actively adapting NAACCR for your data conversion. There is a workgroup spearheading this effort. If you are interested in joining reach out to or @mgurley.


(Michael Gurley) #7

@cliffroared

I responded to your github issue that these issues are mostly caused by these NAACCR variables not being encoded as numeric values:

As for leading zeros, NAACCR uses leading zeros inconsistently for NAACCR Values. So however it is encoded in the NAACCR data dictionary is how we need to represent it in the concept_code column. We have had to fix some, so that is why you see the invalid. Even though in the cases you raise (35918591, 35918412, 35918783), they should not have been entered in the first place. But for a categorical value like the
‘Hormone therapy administered as first course therapy.’ with concept_code = ‘1400@01’. See the following:

http://athena.ohdsi.org/search-terms/terms/716996

Since the NAACCR data dictionary has a leading zero for this NAACCR Value and that is how the categorical value will show up in real data, we need keep the leading zero,

http://datadictionary.naaccr.org/default.aspx?c=10#1400


t