Concept code data type limitations

MPhilofsky · September 1, 2018, 1:21am

Hi Friends,

My source data has a lot of custom coding in the procedure and drug domains. Currently, I create a custom concept_id > 2 billion for these codes. I use the source text string as the concept_name and concept_code for the custom concept_id. When the custom concept is mapped to the domain table, the concept_code maps to the *_source_value. The data type for concept_code and *_source_value is VARCHAR(50). I have quite a few codes that violate this limit (examples below). How do others solve this problem in an efficient manner on a large scale? My EHR data has thousands of custom drugs and well over 100 procedures.

phenylephrine (PF) 0.5 mg/5 mL (100 mcg/mL) in 0.9% NaCl IV syringe
Hysterectomy total abdominal bilateral sapling oophorectomy
Hysterectomy total abdominal bilateral sapling oophorectomy exp
laparotomy/total abdominal hysterectomy/bilateral sapling
oophorectomy/lymph node dissection/debulking

aostropolets · September 1, 2018, 3:27am

Drugs: map them to RxNorm/RxE. In this case will be 2 records: 5 ml phenylephrine 0.1 mg/ml injectable solution (injection) and sodium chloride 9 mg/ml injectable solution .
Procedures: if there isn’t any standard concept that represents the full meaning, I’d suggest mapping to 2 concepts again.
Hysterectomy total abdominal bilateral sapling oophorectomy will become total abdominal hysterectomy and bilateral sapling oophorectomy (under the same date, visit_id, etc.). You can also use modifiers for procedures (i.e. left/right).

Gowtham_Rao · September 1, 2018, 4:19am

Is the desire to retain the full text description in the source name?

Vojtech_Huser · September 5, 2018, 3:22pm

You can also use modifiers for procedures (i.e. left/right).

Can someone clarify how left and right and bilateral modifiers are used in procedures?

What are allowed modifiers (I am asking because I want to create Heel rule checking CDM conformance).

MPhilofsky · September 10, 2018, 1:40pm

Just getting back from vacation…

Yes, we would like to retain the full text string as seen in the source string. The goal is to provide the exact event when it is necessary to map to less granular concepts.

Gowtham_Rao · September 10, 2018, 2:12pm

@MPhilofsky

The problem statement seems to be that you have local values that are considered ‘source code’ that are longer than varchar(50). Although you could assign 2 billion + concept_id for these, the long source code value gets truncated.

Some options - not necessarily in a recommended order

Change the DDL of the omop concept table. Increase varchar(50) to varchar(255). Simplest, but you may have performance issues and most impact to OMOP vocabulary tables.
Use your source_code as concept_name, and populate source_code with random characters. You should have the same functionality, without changing DD, no impact to OMOP vocabulary tables - but you will have problems if > varchar(255)
Creatively use this table with some change to DDL for source_code, source_code_description etc. Probably most complex, but least impact to OMOP vocabulary tables

MPhilofsky · September 10, 2018, 2:50pm

Thanks, @Gowtham_Rao, for the suggestions!

Currently, we are “breaking the rules” and not limiting to varchar50. We haven’t noticed a performance issue. But I wanted to check with the community to see if others have the same problem with their source data and if they had found a best practice.

mkwong · September 10, 2018, 3:54pm

In my exploratory implementations for veterinary medicine and emergency (pre-hospital care) I changed just about all the source fields to VARCHAR(255) and in a few other tables to TEXT to support the source data. No real performance issues so far on 400K patient set.

MPhilofsky · September 18, 2018, 2:16pm

Thank you for sharing your experience, @mkwong!