Today I wanted to ask about cocktail-type biomarkers commonly used in pathology workflows and whether corresponding concepts already exist in the OMOP Genomic Vocabulary.
Overall, most biomarkers and HGNC-based gene concepts in the OMOP Genomic Vocabulary mapped very well, and we were able to perform the majority of our oncology ETL and biomarker mappings without major issues.
However, we encountered some difficulty when mapping biomarker panels that are commonly reported as combined “cocktail” immune stains in surgical pathology workflows.
Specific examples include:
3050376 | Cytokeratin 5/6 Ag [Presence] in Tissue by Immune stain | LOINC
3040360 | Cytokeratin AE1/AE3 Ag [Presence] in Tissue by Immune stain | LOINC
In addition, we also noticed that some widely used individual immunomarkers such as:
3027870 | CD3 Ag [Presence] in Tissue by Immune stain | LOINC
do not appear to have corresponding OMOP Genomic concepts either.
These concepts exist in LOINC and are widely used in pathology reporting, but we could not identify corresponding 1:1 concepts within the OMOP Genomic Vocabulary.
Are there existing OMOP Genomic concepts appropriate for mapping these biomarkers?
If not, I would like to ask whether it might make sense to introduce concepts such as:
Cytokeratin 5/6 protein expression measurement
Cytokeratin AE1/AE3 protein expression measurement
CD3 protein expression measurement
or a broader framework for immunomarker concepts commonly used in oncology pathology workflows.
Yeah. This is something we need to figure out. The problem is twofold:
Immunohistochemistry is a measurement of protein expression, and we need to be on top of all those tissue stains as they are becoming common. LOINC has the same problem.
More importantly, the antibodies used in those stains are not specific for an individual protein sequence, but raised against whatever is exposed on the cell surface.
CD3 is a perfect example: It is part of the T-cell receptor complex and consists of four different membrane proteins: gamma, delta, epsilon and zeta. But zeta is usually referred to by its synonym CD247. Alpha and beta also exist, but they are not called CD3, but just T-cell receptor. And then the T-cell recepter also has its own optional gamma and delta chains, which are not part of CD3. Having fun yet?
We do have OMOP Genomic concepts for each of these genes alpha, beta, gamma, delta, epsilon, zeta, but not for their protein expression, and certainly not for the full complex together:We need to figure that out.
The cytokeratins, used as markers for malignancy, have the same problem: They are also two different genes (KRT5 and KRT6A, but not KRT6B or KRT6C), and we don’t have protein expression concepts for either or them both together.
The cytokeratin AE1 and AE3 are not antigens but monoclonal antibodies active against a whole slew of keratin proteins: cytokeratins 1 - 8, 10, 14 - 16 and 19 (but not 17 or 18). Now what?
But before we sharpen the pencil, we need to know: What is the use case? Do we really need these? For confirming the diagnosis? Validating the test? Is there a use case?
Thank you very much for the detailed explanation. Your comments helped clarify the distinction between molecular entities and operational pathology assay entities.
I now better understand that markers such as CK AE1/AE3 are fundamentally antibody cocktail-based assays rather than discrete genomic or protein entities, and therefore may fit more naturally within the scope of LOINC assay concepts rather than OMOP Genomic concepts.
On the other hand, markers such as CD3 or CK5/6 seem somewhat more nuanced because, although they are also operational pathology immunostains, they are frequently used as clinically meaningful immunophenotypic biomarkers in oncology workflows.
That said, I agree that before introducing additional OMOP Genomic concepts, it would be important to better define the actual analytical and clinical use cases. I’ll repost whether there are sufficiently strong use cases, such as cohort characterization, immunophenotypic stratification, or biomarker-based analytics, that would justify ontology-level representation of these combined immunomarkers.
Thank you again for the thoughtful feedback. It was very helpful for understanding the ontology boundary and modeling considerations around pathology immunostains.