Duplicate concept codes within a vocabulary and all are valid

Viktor_Bek · October 14, 2024, 1:53pm

In the SMQ vocabulary, there is a total of 106 pairs of concepts that are standard (standard_concept value is C), and valid (valid_end_date is 20991231, and invalid_reason set to null). Only the concept IDs differ as well as the concept names. Names differ only in the suffix which is either ‘narrow’ or ‘broad’.

From reading the documentation, my understanding was that concept IDs are to be unique over all available vocabularies, and that concept codes should be unique over a single vocabulary given that we are looking only for valid concepts. I understand that concept codes may be re-used or re-purposed if the previous concept was invalidated, but what I encountered is not the case…

Can someone please shed some light on this situation, and perhaps explain what we should do if we encounter more of these situations in the vocabulary collection? How to uniquely identify a concept if vocabulary_id - concept_code pair is not unique for valid concepts, and we might not always get the concept_name or even if we do get it it might not be equal to any within the database because it is a string and if prone changes and alterations…

We are in the process of parsing some source files and mapping them to vocabularies. Our aim is to read the vocabulary ID and concept code and map it to the corresponding concept (concept ID) in the database. This can end up not being entirely possible if we were to encounter more situations like this. Luckily for us, the source files are not referencing the SMQ vocabulary, but some others might

Christian_Reich · October 14, 2024, 4:48pm

Hi @Viktor_Bek:

The SMQ vocabulary really is a neglected stepchild. It was introduced during the beginning of OMOP (somewhere 2010ish) and never updated. It really should go away, for the simple reason that SMQs aren’t ordinary classification concepts (having descendants connected through “or”), but they have complex logics, à la phenotype definitions. The vocabulary system doesn’t allow for such complexity.

We just never got around killing them, since the urgency was close to zero.

What’s your use case? How do you want to use them?

Maria_Rogozhkina · October 14, 2024, 7:38pm

Hello @Viktor_Bek,

The field standard_concept can be populated with the following values:

‘S’: stands for standard concepts
‘C’: stands for classificational concepts
Null: stands for non-standard concepts

Classificational concepts are hierarchical terms for real concepts that define semantically useful groups, such as chemical structures for drugs (eg. ATC).

SMQ was created to define cohorts from MedDRA codes. Cohorts can be broad (less specific and more sensitive) and narrow (more specific and less sensitive). The job has been done by SMQ authors, and we transferred the result to OMOP in this format. If you want to read more about broad and narrow terms in SMQ, please follow this link.

As @Christian_Reich mentioned, we do not support the SMQ vocabulary. Please find more information about supported vocabularies here.

You are correct that the concept_id is a unique identifier for all concepts, and within one vocabulary, concept_code is unique as well. There are only two exceptions in OMOP: SMQ (as mentioned) and DRG, where the source contains code reuse. That’s why you should not be worried about mapping quality and the uniqueness of codes in the context of all other vocabularies.