OHDSI Home | Forums | Wiki | Github

Source Concept Code Multiply when Joining with Concept Table

Hi,

Need some insights on this, we have loaded the Vocabulary in our database and we have source data of claims, when we try to join concept table to our Dx table, for one code the concept code are multiplying.

Tables Vocabularies

Procedure CPT4, HCPCS, NULL, RxNorm, CIEL, DPD, OXMIS, BDPM, ICD10PCS, Multum
Condition Multum, RxNorm, DPD, OPCS4, DRG, ICD10, CIEL, ICD9CM, ICD10CM, ICD10PCS, BDPM, HES
Specialty, ATC, OXMIS, SMQ
Drug NDC
Surgical ICD9Proc

Converting the source data using above vocabularies will result in data multiplication because a given source code lies in multiple vocabularies. There will be a significant increase in number of records from source to target as Visit Occurrence table consists of data from multiple tables (Procedure, Diagnosis, Surgical and Drug). These tables would have visit occurrence id from Visit table and there will be multiple matches leading to exponential increase in final record count.

Can anyone help with this? what should be the ethical way to avoid multiplication of the records?

@Ajinkya_Patale:

Take a look at the tables. It’s not an entire vocabulary that is in different domain. It’s individual source concepts that can branch out to more than one target concept, each of which might be living in different domains. And this is totally correct, if you look at the examples.

Not at all. This is true for a very small minority of source concepts. Most of them map to one target concept in a single domain. Your increase is way less than one percent.

The approach I generally take: when you have source data from a known
vocabulary (which is one of the many in the OHDSI vocabularies), then you
make a query specific to pull those out. Ex:

select *
from concept
where concept_code = ‘410.01’
and vocabulary_id = ‘ICD9CM’
;

The concept_code should be unique within a given vocabulary, even if it is
duplicated across vocabularies. That’s why you need to specify the source
that it came from.

One you have that CONCEPT_ID, you can find its ‘Maps to’ relationship in
the CONCEPT_RELATIONSHIP table to its standard CONCEPT_ID analogue. >99%
of the time, that’s a 1-to-1 mapping, and in the <1% of the time when one
source concept maps to multiple standard concepts, we would recommend using
that information and creating rows in the appropriate domain tables for
each mapped concept.

t