Regarding regional Node concepts in the Cancer Modifier Vocabulary

First of all, thank you for publishing the Oncology OnRamp documentation. It has been very helpful for reviewing and refining our oncology ETL and mapping strategy in OMOP CDM. We are currently going through the cancer-related domains step by step, and today I wanted to start with the “Nodes” section. I will likely post additional observations and questions regarding other oncology domains later as well.

While reviewing the current Cancer Modifier “Nodes” concepts, I noticed a potential limitation related to real-world surgical pathology workflows.

The current vocabulary mainly provides highly granular station- or level-specific lymph node concepts (e.g. 2R, 4L, 10R, Axillary Level I, Axillary Level II, etc.). These work well when nodal metastasis is documented at an exact anatomical level.

However, in real-world pathology workflows, lymph nodes are frequently submitted and reported as grouped nodal packets rather than as individually separated stations or levels. Common examples include:

  • “LN #2,#4: 3/12 positive”
  • “Paratracheal lymph nodes: metastatic carcinoma”
  • “Intrapulmonary lymph nodes: 1/6 positive”
  • “Axillary lymph nodes: 2/15 positive”

In these situations, exact decomposition into individual stations or nodal levels is often not possible. Mapping the same metastatic count to multiple level-specific concepts can introduce double counting and distort downstream analyses such as nodal burden calculation or staging reconstruction.

Previously, broader SNOMED concepts such as:

  • Structure of paratracheal lymph node
  • Structure of intrapulmonary lymph node
  • Axillary lymph node structure

were sometimes used because they better reflected actual specimen grouping in surgical pathology workflows.

Of course, under the current guidance, these cases could technically be represented using measurement_concept_id = 0 and a SNOMED concept in measurement_source_concept_id. However, this significantly reduces downstream usability because many OHDSI tools and analyses primarily rely on standard concepts in measurement_concept_id.

In breast cancer specifically, several clinically important regional nodal groups are currently missing or only partially represented in the Cancer Modifier vocabulary. For example:

  • Axillary Level I and II are available, but Axillary Level III is missing
  • There is currently no broader “Axillary lymph nodes” concept for grouped axillary nodal packets
  • Internal mammary lymph nodes (IMN)
  • Supraclavicular lymph nodes

are all clinically important regional nodal groups in breast cancer staging and pathology workflows.

I was wondering whether it might make sense to introduce intermediate regional nodal basin concepts and hierarchical relationships into the Cancer Modifier vocabulary.

Possible examples could look something like this:

  • Superior mediastinal lymph nodes
    ㄴ Paratracheal lymph nodes
    ㄴ-- 2R Upper paratracheal
    ㄴ-- 2L Upper paratracheal
    ㄴ-- 4R Lower paratracheal
    ㄴ-- 4L Lower paratracheal
    ㄴ 3A Prevascular lymph nodes
    ㄴ 3P Retrotracheal lymph nodes
  • Intrapulmonary lymph nodes
    ㄴ 10R/L Hilar
    ㄴ 11R/L Interlobar
    ㄴ 12R/L Lobar
    ㄴ13R/L Segmental
    ㄴ 14R/L Subsegmental
  • Axillary lymph nodes
    ㄴ Axillary Level I
    ㄴ Axillary Level II
    ㄴ Axillary Level III
    ㄴ Intramammary lymph nodes
  • Internal mammary lymph nodes
  • Supraclavicular lymph nodes

I think this could better align the vocabulary with real-world pathology workflows while preserving analytic usability and avoiding forced arbitrary assignment to overly granular nodal concepts.

Thank you!

@sumnemo:

The question is, usability for what? What is the use case? Lymph nodes are featured very little in the typical oncology use cases we are collecting. Why not? Their value for prognosis or treatment as per AJCC or guidelines is limited.

Of course, if you are a surgeon performing a radical tumor surgery on an individual patient you need to know exactly if LN level #2 are positive, and which of the 15 axillary nodes. But we don’t do that. We treat populations, and there lymph nodes either show up as “affected” and “not affected”, and maybe “regional” and “distant”.

But if you are thinking of a use case where this is pressing please please bring it on so we can add it to the use case list.

Thank you for the feedback. I agree that a stronger use case is needed.

One use case that comes to mind is retrospective restaging studies using contemporary staging systems applied to historical pathology reports. Increasingly, researchers are using NLP-extracted pathology data rather than relying solely on registry-derived stage variables, especially when newer staging systems become available.

For example, the AJCC 9th edition introduces additional N-category refinements in lung cancer (e.g., N2a vs N2b), which depend on the distribution of nodal metastases across N2 stations. However, many historical pathology reports describe findings as grouped nodal packets such as:

“LN #2,4: one metastasis in seven lymph nodes”

In such cases, the pathology finding can be interpreted as involvement of the paratracheal nodal basin, but there is currently no corresponding Cancer Modifier concept. As a result, these findings may be difficult to represent in a standardized manner and may be excluded from large-scale restaging studies.

A similar issue exists in breast cancer. Axillary Level III, internal mammary nodes, and other regional nodal groups have staging implications, but some are currently missing or incompletely represented. In addition, pathology reports often describe “axillary lymph nodes” as a grouped packet rather than separating Level I and II nodes.

From my perspective, the value of these concepts is less about supporting individual surgical decision-making and more about enabling population-level restaging and outcome studies using historical pathology data and evolving staging systems.