OHDSI Home | Forums | Wiki | Github

Representing interface terminology to users

Hi all,

One issue that we’ve run into in rolling out our OMOP instance to the research community is that of interface vs. reference terminology. We’ve been using the SOURCE_VALUE columns to insert the reference terminologies from the source systems - diagnosis_source_value gets the ICD-9/10 code, measurement_source_value gets the LOINC code, procedure_source_value gets the CPT code, etc.

However, we’re finding more and more that users want (or need!) to build queries based on interface terminology, rather than reference terminology. Take platelets, for example. There are multiple LOINC codes for platelet counts - without a clear correspondence between the interface terminolgoy and the reference terminology, users have no way to know which one corresponds to the platelet values they see on a day-to-day basis (for example, the one that returns from a standard auto differential) and which one corresponds to an arcane measurement that’s barely ever used in clinical practice. They could theoretically run queries to see which are ordered more than others, but that requires extra work and still doesn’t establish a one-to-one correspondence.

Procedures also throw this issue for us - we use non-standard CPT codes for many common procedures (including auto differentials), which don’t get mapped to CONCEPTs. We can insert the non-standard codes in procedure_source_value, but without the interface mapping, again, the user has no way to draw a conceptual linkage from the “AUTO DIFFERENTIAL” they see in a patient’s chart and the “01006.123” they see in procedure_source_value.

One way we’ve explored addressing this is creating views called “MEASUREMENT_LOOKUP” and “PROCEDURE_LOOKUP” that handle this mapping - allowing them to see which LOINC codes or CPT codes correspond to which interface terminology entries. This is nonstandard, though, and we’d love to be able to handle this solely within the confines of the CDM standard tables.

Has anyone else confronted this issue? What are some of the techniques you’ve applied in trying to navigate the gap between the interface terminology researchers and clinicians deal with in the EHR frontend and the standardized reference terminologies we use to populate the CDM tables?


This is an interesting thought. We have a slightly different terminology: We can your interface terminology Standard Concepts, and reference terminology Source Concepts. The latter are mapped to the former. But for this to work the way you envision there must be only one Standard Concept for a given medical entity, and all the Sources should be mapped to them. That is the case only partially:

  • Drugs are very unique. RxNorm/RxNorm Extension have every drug product or higher level concept only once, with very few exceptions. Source codes are mapped into this world.
  • Conditions are curated by SNOMED. They are generally doing a good job, but it’s work in progress, and as they add Concepts the deduplication can lag behind. But still, the situation is pretty good.
  • Procedures are pretty bad, because we made the different Source Concepts (reference terminologies: CPT4, HCPCS, ICD9Proc, ICD10PCS, SNOMED) Standard Concepts (interface terminology), so they can be used in the data. There is currently a project underway to fix that domain. Any interest in helping out?
  • Measurements are mostly based on LOINC (well deduped) plus SNOMED, creating some overlap. Situation needs cleaning, but is tolerable.
  • Devices and Observations are the Wild West.
  • Small terminologies like Place of Service and Specialty are small enough, so the THEMIS Focus Groups can just clean them up where there is ambiguity.
  • Metadata terminologies (e.g. Type Concepts) are in good shape.

I’ve been thinking and doing something similar regarding the use of SOURCE_VALUE. First use case is to squirrel away debugging information, then storing structured data to be able to go back to the source in terms of string search and semi-structured string searches. In my case I use a ..: sort of format - easy to parse locally and into Java name:value sort of use. I’m leaning toward XML myself for structured object data. All internal use only during development and sorting out ETLs and interfaces.

I suggest adding site specific fields rather than packing a number of
different strings into the source columns. Same advantage as not
interfering with OHDSI tools, but saves the hassle of having to parse
strings and allows creating columns with meaningful names.

What’s that? Why aren’t the Source Concepts not giving you what you need? They have vocabulary, code and name.

Yup, a pipe-delimiter was what first struck me as a potential option based on your reply. We view the source_value columns as freebies as well - just a question of what to stick in there. I don’t necessarily view this is as only useful for these last-resort queries, though. Rather, it’s a way for users who don’t have access to the source data model to ensure that they’re querying the concepts they think they’re querying. Take the platelet example - the transition from EHR-specific components to LOINC codes is not transparent to most of our users. Plus, not all of our users have front-end EHR access, so it’s not like they can just find a patient in the front-end and see what LOINC code corresponds to a value. This would let them see that what shows up in the front end as PLATELET maps to CONCEPT_ID 3024929 and not any of the other potentially plausible concept_ids. Thanks for your input - we may end up going with this.