OHDSI Home | Forums | Wiki | Github

Concept_synonym usage in concept searching


(Alexander Davydov) #1

In 2015 the rule was introduced that any concept should have at least one synonym. And we simply use concept_name as a placeholder for concepts missing the real synonyms.

Currently, we have only 1463 concepts without the synonyms, and they’re mostly in the Metadata/Visit/Payer/Cost domains.

The thing is the approach is not consistent and the current picture is:

  • 7,693,299 concepts have only one synonym that is an imputed concept_name;
  • 1,073,940 concepts have >1 synonym, where one of them is imputed or match the concept_name;
  • 239,314 concepts have one or more synonyms, but none of them is an imputed concept_name.

Because of that, the search results of algorithms used in Athena or other tools may be affected: additional match within the synonyms will increase the matching score, but, in fact, the synonym is imputed.

Here is the proposal to be implemented:

  • Do not impute the synonyms. Leave the concepts without the synonyms if the source doesn’t provide such.
  • Do not allow the synonyms that match the concept_name, except they’re in the national language (language_concept_id <> ‘4180186’ English language).
  • Drop the existing synonyms according to these rules.

Once this implemented, there is the only possibility to string-search the concepts within both tables (concept + concept_synonym), while currently false confidence that concept_synonym is enough may exist.
We’d like to hear from the community how this could affect the string-search you used, especially withing OHDSI tools (Athena, Atlas, USAGI).

Tagging @Chris_Knoll @anthonysena @Yaroslav @schuemie @MaximMoinat @acumarav @Christian_Reich @Dymshyts


(Polina Talapova) #2

The thing that bothers me the most about it is that all our scripts for automated mapping subject to the existence of 2 tables (concept + concept_synonym). It means that we will need to rewrite them all :exploding_head:.

In 2015 the rule was introduced that any concept should have at least one synonym. And we simply use concept_name as a placeholder for concepts missing the real synonyms

Is there any way not to store single concept names without synonyms in the concept_synonym table?

  • Do not impute the synonyms. Leave the concepts without the synonyms if the source doesn’t provide such.
  • Do not allow the synonyms that match the concept_name, except they’re in the national language (language_concept_id <> ‘4180186’ English language).
  • Drop the existing synonyms according to these rules.

I like all these ideas but I do not understand why we cannot keep the concept_synonym table alive.


(Alexander Davydov) #3

No :blush: We leave both tables alive. Concept_synonym becomes more clean.

Sure, the primary names live in the concept_name. The secondary names and translations live in the concept_synonyn_name.


t