Concept_synonym usage in concept searching

Alexdavv · July 29, 2020, 12:11pm

In 2015 the rule was introduced that any concept should have at least one synonym. And we simply use concept_name as a placeholder for concepts missing the real synonyms.

Currently, we have only 1463 concepts without the synonyms, and they’re mostly in the Metadata/Visit/Payer/Cost domains.

The thing is the approach is not consistent and the current picture is:

7,693,299 concepts have only one synonym that is an imputed concept_name;
1,073,940 concepts have >1 synonym, where one of them is imputed or match the concept_name;
239,314 concepts have one or more synonyms, but none of them is an imputed concept_name.

Because of that, the search results of algorithms used in Athena or other tools may be affected: additional match within the synonyms will increase the matching score, but, in fact, the synonym is imputed.

Here is the proposal to be implemented:

Do not impute the synonyms. Leave the concepts without the synonyms if the source doesn’t provide such.
Do not allow the synonyms that match the concept_name, except they’re in the national language (language_concept_id <> ‘4180186’ English language).
Drop the existing synonyms according to these rules.

Once this implemented, there is the only possibility to string-search the concepts within both tables (concept + concept_synonym), while currently false confidence that concept_synonym is enough may exist.
We’d like to hear from the community how this could affect the string-search you used, especially withing OHDSI tools (Athena, Atlas, USAGI).

Tagging @Chris_Knoll @anthonysena @Yaroslav @schuemie @MaximMoinat @acumarav @Christian_Reich @Dymshyts

Polina_Talapova · July 30, 2020, 7:35am

The thing that bothers me the most about it is that all our scripts for automated mapping subject to the existence of 2 tables (concept + concept_synonym). It means that we will need to rewrite them all .

In 2015 the rule was introduced that any concept should have at least one synonym. And we simply use concept_name as a placeholder for concepts missing the real synonyms

Is there any way not to store single concept names without synonyms in the concept_synonym table?

Do not impute the synonyms. Leave the concepts without the synonyms if the source doesn’t provide such.

Do not allow the synonyms that match the concept_name, except they’re in the national language (language_concept_id <> ‘4180186’ English language).

Drop the existing synonyms according to these rules.

I like all these ideas but I do not understand why we cannot keep the concept_synonym table alive.

Alexdavv · July 30, 2020, 8:45pm

No We leave both tables alive. Concept_synonym becomes more clean.

Sure, the primary names live in the concept_name. The secondary names and translations live in the concept_synonyn_name.

Christian_Reich · January 6, 2021, 11:35am

Agreed. I don’t think any script or tool (Athena, Atlas) should change. But we should ask. @Chris_Knoll, @Konstantin_Yaroshove?

Do we have a Github issue for this?

Konstantin_Yaroshove · January 6, 2021, 11:52am

As I see ATLAS/WebAPI do not use concept_synonym table. In the same time I do not see issues for ATHENA logic. But it is widely used across many OHDSI components where my knowledge is limited:

I would involve more people in this discussion.

Alexdavv · January 6, 2021, 12:17pm

Not yet.

tagging @acumarav @Yaroslav

Christian_Reich · January 6, 2021, 1:20pm

What I was thinking is this (correct me if I am wrong):

We have a search technology and a UI for searching in Athena. It runs off of the tables ProdV5. Athena as such is fine (well, it has issues, but nothing to do with this).

Atlas uses SQL and another UI. It is slower and it is confusing because it looks similar to Athena but behaves slightly differently. So, I am thinking to transplant the fast Athena search to Atlas for the Vocabulary search functionality. Of course, it will run on the local vocabulary tables.

However, we may find that Atlas does things better/more correctly/more intuitively than Athena. If that is the case we should also consider improving Athena.

Bottom line: create one optimal tech stack/functionality/UI for vocab searching and browsing, and deploy to both tools. Don’t have two parallel and almost identical solutions.

Alexdavv · January 20, 2021, 11:12am

The plan is to implement this vocabulary fix in one of the next releases.

@MaximMoinat @schuemie @anthonysena @Chris_Knoll @wivern
Please let us know if this can affect the search logic in Atlas or USAGI.

MaximMoinat · January 25, 2021, 8:46am

Hi Alex. I don’t expect it to affect the search logic of usagi. It only adds synonyms to the index if they are not equal to the concept name.

I like the proposal by the way!

Alexdavv · February 17, 2021, 12:10pm

This will be implemented within the next vocabulary release.