OHDSI Home | Forums | Wiki | Github

Value dependent mapping using STCM?

Is there an established best practice when the mapping is dependent on both the source code and value?

For example, we have a source term “Alcohol user?” where the value is either “yes” or “no”. If we ignore the value the best interpretation we can gather is that the question was asked, which isn’t all that useful. However, if the combination of term and value are considered we can map to either:

If this isn’t a solved problem, my initial thought is to add a column to source_to_concept_map for source_value, which is left NULL unless the mapping is value dependent.

Something along the lines of:

....
INNER JOIN 
(
	SELECT *
	FROM source_to_concept_map 
	WHERE source_vocabulary_id = 'xyz vocab'
) stcm
ON source_table.code = stcm.source_code
AND 
(
	stcm.source_value IS NULL
	OR
	source_table.value = stcm.value
)

Yes, those are tabs.

Any feedback on how others have tackled this would be appreciated.

@rtmill, I am only working on a prototype right now, but hopefully I can help, and maybe others who have much more experience than me can jump in.

I used the observation_concept_id to classify the observation topic, and in the value_as_concept_id, I put the result.
In your example, this would translate into:

observation_concept_id=4074035 ("Current drinker")
value_as_concept_id=4188539 ("Yes")
or 
value_as_concept_id=4188540 ("No")

This will need to be populated using a CASE statement in your ETL rather than a straight lookup on the source_to_concept_map table.

I haven’t tested this in analysis scripts yet, so can’t comment on whether this really works in the end.

P.S. I am in the tab camp.

Friends:

This is an issue that was brought up when @ColinOrr2006 or @ColinOrr started the Survey proposal and debate. That’s where these question and answer games ought to go. I agree (and said it somewhere there) that for the yes-answers we need a record in the “proper” table. But we also realize this is a big mapping job, and currently there is no plan to get such a thing done.

With respect to alcohol, drugs etc. there is a THEMIS group 1 working on the conventions how to store this. We should re-invent the wheel too often.

Interesting discussion, we are having similar issues. To react to @ssrobertson, I do not think this should be captured by a yes/no in the observation table.

The wiki description of the observation table states (point 4):

• It is recommended for observations that are suggestive statements of positive assertion should have a value of “Yes” (concept_id=4188539), recorded, even though the null value is the equivalent.

In my view, the concept ‘Current drinker’ is suggestive and should always be answered with ‘yes’. For example one might naively search for every drinker by querying only at the observation_concept_id ‘Current drinker’ and not look at the value.

So in this particular case, I would go for two distinct concepts: ‘Current drinker’ and ‘Non - drinker’ (4022664). But indeed, as @Christian_Reich pointed out, lets wait and see what is decided in the Themis working group.

1 Like

@rtmill,

We tackled this pre-Themis. However, many questions remain and the collective “we” need to give input and clarify conventions with the Themis WG. Some questions I have re: the lifestyle/social history domain:

  1. There are multiple, standard concepts for alcohol drinker. Which ones should we use?
  2. If you have more granular data (start and quit dates, amounts consumed, type consumed, etc.), where should it go? And how do you capture all the data in the CDM?
  3. There are multiple classes and multiple domains for other lifestyle/social history concepts. How do we reconcile?

You should join the WG :slight_smile: More points of view will make the conventions more robust!

Below is our current implementation:

We create a concept in the Concept table, then create a relationship to a standard concept in the Concept Relationship table.

Our EHR data does not have a source “code” attached to lifestyle/social history data elements. So, I took the field “alcohol use” and then the individual values “yes”, “no” and concatenated the two. The result “AlcoholUse:Yes” is our concept_name and our concept_code. Our data in the Concept table looks like this:

The record for the mapping resides in the Concept Relationship table:

concept_id = 45766930 = Admits alcohol use

@Christian_Reich Games or productive conversations? :smile:

The example of alcohol use may have been too specific. Our source data contains a variety of questions that can be mapped when the value is considered:

  • “Are you safe at home?”
    • Yes - > 44805176 ‘Feels safe in own home’
    • No → 44805166 ‘Does not feel safe in own home’

Additionally, this key-value structure is not limited to survey questions:

  • “IV Type”
    • Single Lumen → 4106030 ‘Single lumen catheter’
    • Peripheral IV → 4219637 ‘Peripheral intravenous catheter’

@MPhilofsky I like your approach of concatenating the concept and value. Two questions arise from this:

  1. For custom mapping, when do you create a custom vocabulary that lives in the concept table vs. use the source_to_concept_map table. I have been using STCM thus far as these mappings are thought to be
    institution specific
  2. Is that custom vocabulary (UCHealth_CDW) comprised entirely of key-value, concatenated concept_codes? If not, how is a concept flagged as being value-dependent or not? Efficiency is a large concern here given the volume of observations (hundreds of millions) that need to be mapped. Unnecessary concatenation or string search (e.g. contains ‘:’) could slow things down quite a bit for us

Well, the way the surveys go is that you define in the vocabulary permissible values for any question. Could be “yes” and “no”, or your more defined answers. Either way.

That, indeed, is not a survey. Not even a question and answer. You can just drop “IV type” and use “Single lumen catheter” or “Peripheral intravenous catheter” as device_concept_id.

@rtmill,

  1. We only use the Concept and Concept Relationship table. We do not use the STCM table. The creation of concepts is institution specific. We have source data from two different Epic systems and there is less than <10% overlap in concepts. And most of that overlap is in the Units domain.
  2. No, our custom vocabulary only has both key-value, concatenated concept codes and various other custom codes. Most of the key-value, concatenated concepts come from the social history/lifestyle domain. We create the concatenated values to enable quick and easy discovery of the exact representation from the source.

Does it matter if the custom concept is a bogus “NDC” code, “custom” code, or value dependent, concatenated code? They almost all map to standard concepts. And, yes, mapping is a huge effort that should only be undertaken for a good business case :slight_smile:

Melanie

I think dropping the concept to which the value represents would spell trouble. This highlights the issue that some representations require considering both the concept and value together to adequately map the meaning of the observation.

To stick with the same example:

We’ve been using STCM for custom, institution specific mappings for ease of use (direct output of Usagi, can avoid loading in custom vocabularies after each vocabulary update), though I don’t think there’s a huge difference between the two.

I figured other folks were running into this same issue but it sounds like we should just come up with an institution specific solution, which would probably be adding a column to the STCM table.

Thanks for that example. It’s a classic: Mapping a source code to two target concepts, when there really is only one source. That’s a typical pitfall:

  • You can map one thing to many concepts. But only when it consists of a pre-coordinated set of meanings. Like “Diabetes and diabetic retinopathy”. These are two Conditions. Then you can map to two Concepts.
  • If your source only denotes one semantic meaning you cannot do that. Even if you cannot find a target concepts covering all attributes you got. In your case, source code 123 is now both a peripheral intravenous catheter and a single lumen catheter. What you want is a single “single lumen peripheral intravenous catheter”.

In this particular case, this is a device. We don’t have a standardized vocabulary system for devices, where there is no duplication. So, anything that comes in gets to be a standard Concept. So, grab a concept_id > 2 billion and plug it in the way it is.

But even if you want to map it to something already there: It’s not a EAV model. There is no entity attribute value here. A “single lumen peripheral intravenous catheter” is by its nature an IV device. No need to say that in the data.

I think my example was confusing. My point was that for some observations we can only derive the intended meaning by considering both the observation (iv type) and the value (single lumen) together. When considered independently the meaning is incomplete (e.g. single lumen what?).

That’s a good way to put it. I’d argue in these cases the distinct pairs of observation and value represent distinct semantic meanings. The intent isn’t to map from a single observation to multiple concepts but to map from distinct combinations of observation+value into single concepts.

My example was intended to be about separate observations, not attributes of the same observation. We’d be generating false data if we said every single lumen IV was also peripheral and visa versa.

I won’t belabor the issue as we have a solution for our needs which doesn’t seem extensible as it can only consider two fields. For source data where the meaning is dependent on a qualifier (e.g. obs: family history + value: diabetes + qualifier: paternal) this strategy wouldn’t be sufficient.

The real question is, if I also advocate adding an optional ‘source_value’ column to the CONCEPT_RELATIONSHIP table, how long will my forum account be suspended for?

Totally understood. The source data are crazy. That’s why we need to do the work during ETL to clean that up. There really is no reason to cut it into two components: IV type and single lumen. The device is a single “single lumen IV catheter”. And that’s what we need in the OMOP CDM.

Exactly.

I don’t think there really are two fields. For family history we are allowing EAV for two reasons:

  • The record is today, but the value is in the past
  • Potentially, the value is the entire Condition space, which means we have to duplicate all Conditions to also appear in Family history of

Why would that help you? Now I am confused.

The data is already cut up in two pieces, I’m trying to combine them. By ‘do the work during the ETL’ do you mean outside of using the STCM? The effort and complications of formatting each enumeration into an adequate single-field representation before its mapped seems avoidable.

I’m sure the data could be preformatted before joining with the STCM with some nested case statements or something similar but the structured table (STCM) approach is personally more appealing in terms of maintenance and interoperability.

You’re right, it doesn’t help us. The thought was that other folks that have custom vocabularies in the concept table could follow a similar approach. I’ll retract that hypothetical

Oh. Yes, you can use the STCM for that no problem. Except, if you want to combine two things into one, it doesn’t really help you. But you’ll figure that out. :smile:

t