Why is phenotyping difficult in OHDSI? Blame the concept set expression and vocabulary

Vojtech_Huser · September 19, 2025, 4:01pm

I fully agree that completeness is much harder than accuracy.
I like naming somehow the lateral relationship.

There are 2 types from perspective1.

Lateral-definitional: part of defining a term (e.g., LOINC components of codes, SNOMED condition (if not primitive). Think component terms relationships to composite terms (non primitive terms)
Lateral-non-definitional: originating outside definition

Note, however, that many lateral-definitional relationships do make their way into hieararchy. At least in SNOMED, the computer-assisted placement of terms into correct parent concepts in fact converts a large portion of lateral relationships into parent-child (is-a) relationships.

“Opposite”/companion/complement of lateral relationship is hierarchical (=parent-child, =is a) relationship.

SNOMED free use term subset is giving away the terms and titles but NOT the fancy definitions (relationships to component terms. So some terminologies have IP (intellectual property) that comes into play. The agreement with SNOMED Int. that OHDSI may have - if we open the box of importing lateral-definitional relationships - oh boy…

Also note that those require groupings of relationships and that is something current vocab tables can’t handle. (see link at the bottom (and maybe later I may add more examples of grouping (and ideally how they make or NOT MAKE their way into is-a hierarchy)

From perspective 2: creator of relationships

authored by SDO (e.g., LOINC gives us their hierarchy and groupins) (without IP restrictions)
authored by community (not given by SDO during terminology download)

Community can be OHDSI (e.g., RxNormExtension) but it could also be some non OHDSI community (e.g., wikidata).

Also consider maping holder/owner and submission of issues to that holder entity. (where we import it from).

We don’t want version disconnect. If SNOMED term is wrong, don’t tell Athena team, submit a ticket to SNOMED holder.

If we want improvement in relationships, we can dream of SDO doing it using taxpayer money (SNOMED model; annual contribution) or be realistic and accept that if we want progress, it may have to come from community.
Long term funding for infrustructure in medicine is unpopular with many funders.

If we don’t adopt any imperfect/community mapping into Athena, this does not advance humanity further. We may import them even if we know they can be inaccurate or incomplete and encourage folks to submit additions/corrections to the holder entity. (not to OHDSI).

In a way we are doing it today as well. We say: “use our Dx mappings but also triple check all mappings” - if your rigor is regulatory grade…

We have community contribution (of type add concept) for OHDSI but we may need community mini-contribution (of type downvote an incorrect relationship [and tell the holder !!])

This whole problem is solved if concept set definition is done in R or Python or code. (and a smart way to use it later in GUI). That way any external and smart relationship can be incorporated without acrobatics in Athena or GUI. And introducing exploded complexity in Athena layer

We would first have to define concept set as a dynamic component of phenotype (currently it is a static snapshot copy-of). I am surprised we survived for this long resisting the demand for it.
That is a culpit to solve for the whole challenge of advanced-researcher phenotype definition.
Well, TAB is on it with the schema defintions a bit and maybe Atlas 3 has that as starting point too.

ADDITIONAL INFO
Grouping problem link (for hardcore terminologiest only)
Expressions With Attribute Groups | SNOMED International Documents

here is the promised example of grouping
(putting angle brackets on link forces expansion to be off, I finally googled it)

https://browser.ihtsdotools.org/?perspective=full&conceptId1=1260293004&edition=MAIN/2025-09-01&release=&languages=en&latestRedirect=false

as svg (SVGs are not supported) (the above picture is not rendered well by the forum (Discourse)

Also note that state vs inferred (in SNOMED) matters. Inferred of course but flipping between those allows to see the computational placement of defined terms into computer-only managed parents. (because humans are less perfect at this).