OHDSI Home | Forums | Wiki | Github

Concept code data type limitations

Hi Friends,

My source data has a lot of custom coding in the procedure and drug domains. Currently, I create a custom concept_id > 2 billion for these codes. I use the source text string as the concept_name and concept_code for the custom concept_id. When the custom concept is mapped to the domain table, the concept_code maps to the *_source_value. The data type for concept_code and *_source_value is VARCHAR(50). I have quite a few codes that violate this limit (examples below). How do others solve this problem in an efficient manner on a large scale? My EHR data has thousands of custom drugs and well over 100 procedures.

  • phenylephrine (PF) 0.5 mg/5 mL (100 mcg/mL) in 0.9% NaCl IV syringe

  • Hysterectomy total abdominal bilateral sapling oophorectomy

  • Hysterectomy total abdominal bilateral sapling oophorectomy exp
    laparotomy/total abdominal hysterectomy/bilateral sapling
    oophorectomy/lymph node dissection/debulking

Drugs: map them to RxNorm/RxE. In this case will be 2 records: 5 ml phenylephrine 0.1 mg/ml injectable solution (injection) and sodium chloride 9 mg/ml injectable solution .
Procedures: if there isn’t any standard concept that represents the full meaning, I’d suggest mapping to 2 concepts again.
Hysterectomy total abdominal bilateral sapling oophorectomy will become total abdominal hysterectomy and bilateral sapling oophorectomy (under the same date, visit_id, etc.). You can also use modifiers for procedures (i.e. left/right).

Is the desire to retain the full text description in the source name?

You can also use modifiers for procedures (i.e. left/right).

Can someone clarify how left and right and bilateral modifiers are used in procedures?

What are allowed modifiers (I am asking because I want to create Heel rule checking CDM conformance).

Just getting back from vacation…

Yes, we would like to retain the full text string as seen in the source string. The goal is to provide the exact event when it is necessary to map to less granular concepts.

@MPhilofsky

The problem statement seems to be that you have local values that are considered ‘source code’ that are longer than varchar(50). Although you could assign 2 billion + concept_id for these, the long source code value gets truncated.

Some options - not necessarily in a recommended order

  1. Change the DDL of the omop concept table. Increase varchar(50) to varchar(255). Simplest, but you may have performance issues and most impact to OMOP vocabulary tables.
  2. Use your source_code as concept_name, and populate source_code with random characters. You should have the same functionality, without changing DD, no impact to OMOP vocabulary tables - but you will have problems if > varchar(255)
  3. Creatively use this table with some change to DDL for source_code, source_code_description etc. Probably most complex, but least impact to OMOP vocabulary tables

Thanks, @Gowtham_Rao, for the suggestions!

Currently, we are “breaking the rules” and not limiting to varchar50. We haven’t noticed a performance issue. But I wanted to check with the community to see if others have the same problem with their source data and if they had found a best practice.

In my exploratory implementations for veterinary medicine and emergency (pre-hospital care) I changed just about all the source fields to VARCHAR(255) and in a few other tables to TEXT to support the source data. No real performance issues so far on 400K patient set.

Thank you for sharing your experience, @mkwong!

t