OHDSI Home | Forums | Wiki | Github

OPCS4 to SNOMED mapping issues

I’ve noticed the concept relationship mappings for ‘OPCS4 - SNOMED’ have a lot of issues. The most common of which I’m seeing is that the OPCS4 concepts map to SNOMED concepts that have a much more vague meaning, causing a significant loss in detail. It’s a little odd because a good portion of the OPCS4 concepts have an equivalent SNOMED concept, often with the same concept name.

Some examples:
Format: (OPCS4 concept | SNOMED concept that its mapped to | SNOMED concept by joining on concept name)

I came across this after Usagi suggested a few terms get mapped to OPCS4 (correctly as they are standard concepts) but do we not need to leverage the mappings. Do other institutions use these relationships? Is this an issue folks plan to address? If the answer to the last two questions is no perhaps we should consider changing the status of the relationships to invalid?

Lastly, it seems odd to have concepts in different vocabularies with the same concept name, domain, and meaning and have them both be standard.

If that happens, I’d say it is wrong. The idea around the ‘standard’ concept is that there is the primary way to represent the clinical idea in data. It would be very strange to have 2 different concept IDs with the exact same name to both be standard. One of them should take the standard position, and all other concepts should map over to it.

Hi @rtmill, @Chris_Knoll

Thanks for noticing. We already fixed it. Please check the Athena now. The release was done just after you posted.
There should be a joke about magical vocabulary team, who fixes the whole vocabulary less in 12 hours.:smile:

@aostropolets, can you please point out where the ongoing and planned vocabulary work is posted?

1 Like

I’m starting to think Hogwarts has a data science track. Impressive stuff, thanks.

I’d suggest a smoke-test for this is to do a query looking for standard concepts that have the same name and same domain:

select concept_name, domain_id, count(*) dupes
FROM concept
where standard_concept = 'S' and invalid_reason is null
group by concept_name, domain_id
HAVING COUNT(*) > 1

If any records come back, then you’ve found multiple concepts with the same name in the same domain that are standard, and that should probably be investigated. I’ve run this on our latest download (which we labeled ‘Vocabulary_20180609’) and found 11727 occurrences. Here are the top 20:

concept_name 	domain_id	dupes
Influenza Virus Vaccine, Inactivated A-California-07-2009 X-181 (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Victoria-210-2009 X-187 (H3N2) (A-Perth-16-2009) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008	Drug	19
Influenza Virus Vaccine, Inactivated A-Brisbane-59-2007, IVR-148 (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Uruguay-716-2007, NYMC X-175C (H3N2) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008 strain 0.03	Drug	17
Not treatment functions	Provider Specialty	16
Nelaton catheter	Device	16
Ensure Plus Liquid (Abbott Nutrition)	Device	15
Ascorbic Acid 90 MG / Beta Carotene 3000 UNT / Biotin 0.045 MG / Calcium 162 MG / Cupric oxide 2 MG / Ferrous fumarate 10 MG / Folic Acid 0.4 MG / Magnesium Oxide 100 MG / Niacinamide 42.5 MG / Phosphorus 125 MG / ... Oral Tablet	Drug	12
Patients who use hospice services any time during the measurement period	Procedure	12
no	Observation	12
Once or twice	Observation	10
Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.03 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011 (H3N2)-like virus 0.03 MG/ML / Influenza B virus vaccine, B-Brisbane-60-2008-like virus 0.03 MG/ML / Influenza B virus vaccine, B-	Drug	10
Influenza Virus Vaccine, Inactivated A-California-07-2009 X-179A (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Victoria-210-2009 X-187 (H3N2) (A-Perth-16-2009) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008	Drug	10
Fortimel liq Liquid (Nutricia Ltd)	Device	9
Country of birth	Observation	9
Fortisip 200ml High energy food (Nutricia Ltd)	Device	9
Ensure Plus 200ml Liquid (Abbott Nutrition)	Device	9
Alanine / Arginine / Calcium Chloride / Dibasic potassium phosphate / Glucose / Glycine / Histidine / Isoleucine / Leucine / Lysine / Magnesium Chloride / Methionine / Phenylalanine / Proline / Serine / Sodium Acetate Trihydrate / Sodium Chloride / Threon	Drug	9
Fortisip High energy food (Nutricia Ltd)	Device	9
Ensure Liquid (Abbott Nutrition)	Device	9
Alanine 21.7 MG/ML / Arginine 14.7 MG/ML / Aspartate 4.34 MG/ML / Glutamate 7.49 MG/ML / Glycine 10.4 MG/ML / Histidine 8.94 MG/ML / Isoleucine 7.49 MG/ML / Leucine 10.4 MG/ML / Lysine 11.8 MG/ML / Methionine 7.49 MG/ML / Phenylalanine 10.4 MG/ML / Prolin	Drug	8
Programming device evaluation (in person) with iterative adjustment of the implantable device to test the function of the device and select optimal permanent programmed values with analysis, review and report by a physician or other qualified health care 	Procedure	8

@Chris_Knoll, wonderful, we are well aware of this; will get to devices someday. If you have a use case now - bring it in. Otherwise no worries.
btw, your query has false-positive results: concepts are cut to fit 255 symbols, so they look the same.
@Dymshyts https://github.com/OHDSI/Vocabulary-v5.0/wiki/Ongoing-tasks-list

1 Like

Challenge, accepted!

select concept_name, domain_id, count(*) dupes
FROM concept
where standard_concept = 'S' and invalid_reason is null and LEN(concept_name) < 100 and domain_id <> 'Device'
group by concept_name, domain_id
HAVING COUNT(*) > 1
ORDER BY  count(*) desc, domain_id

This still returns around 9,000 items, however when I looked for those records in our Observation, Procedure, Condition and Drug tables, none of those concepts were found on our very large large commercial claims database.

For those interested in how to check your own CDM:

with cteDupes as (
	select concept_id, concept_name, domain_id, count(*) dupes
	FROM concept
	where standard_concept = 'S' and invalid_reason is null and LEN(concept_name) < 100 and domain_id <> 'Device'
	group by concept_id, concept_name, domain_id
	HAVING COUNT(*) > 1
),
cteConceptCounts as (
	select drug_concept_id as concept_id, count(*) as record_count from drug_exposure group by drug_concept_id 
	UNION ALL
	select procedure_concept_id as concept_id, count(*) as record_count from procedure_occurrence group by procedure_concept_id 
	UNION ALL
	select condition_concept_id as concept_id, count(*) as record_count from condition_occurrence group by condition_concept_id 
	UNION ALL
	select observation_concept_id as concept_id, count(*) as record_count from observation group by observation_concept_id 
)
select d.concept_name, d.domain_id, d.dupes, c.record_count
FROM cteDupes d
JOIN cteConceptCounts c on d.concept_id = c.concept_id

This query will reveal ‘real world’ cases.

Hahaha :smile:
So, HCPCS codes are back from the dead: old deprecated codes have become standard but their end dates represent the time they were buried in the real life. Snomed: mainly the synonym issue (more precise information is stored in synonyms); I recall a discussion about it…
And then, of course, you can’t judge simply based on names: State, for example, can be a location or a question (to be used with a number of answers within a vocab).
I bet you’ll create a new query to prove your point (like excluding HCPCS and adding group by concept_class_id). Won’t stop you :slight_smile:
What I’m saying is that duplicates do exist, we’re fighting with them and even have QA ace up our sleeves.

Haha, @aostropolets, you are a good sport :smile:

I’m not trying to rub anyone’s noses in anything, just trying to help people find potential issues. You rightfully point out that my query is full of holes, and if there is a desire to try to identify these things heuristically, I’m happy to try to craft some SQL on it. But I get that you guys get that there are some cases and you just have to take them on as you find them.

Interesting thought about the concept class to distinguish the context of the terms, I’ll check that out for my own curiosity, I won’t do any more harassment on the thread :slight_smile:

I did find it interesting that there’s like 6 or so concepts named ‘no’ and I’m wondering if there’s a multitude of meanings for ‘no’ in the vocabulary.

1 Like

These ‘no’ are almost all from PPI survey vocabulary and are answers to the particular questions that are represented in their concept_codes. In this way it possible to distinguish between, for example, no, I’m not pregnant and no, I’m not smoking.

here’s another riddle: HCPCS has totally equal, both active concepts
https://hcpcs.codes/g-codes/G9977/
https://hcpcs.codes/g-codes/G9893/

what should we do:
keep both standard
or randomly map one to another?

They both should be non-standard, and mapped to concept_id=0. Because they are negative facts.

How do you do that in your mapping exercise for the Procedure hierarchy?

hm.
good point, @Christian_Reich,

still don’t have a solution: ICD9Proc and OPCS4 that’s released already don’t have a negative facts, as well as ICD10PCS we are working on.

So the idea is: to define all negative concepts and make them non-standard, right?

These ‘no’ are almost all from PPI survey vocabulary

I find these duplicate ‘No’ terms in PPI an example of ‘OHDSI informatics mistake’. We should do something about them. I posted the problem a while ago here https://github.com/OHDSI/Vocabulary-v5.0/issues/182

Friends:

If they are from a survey they should be taken out of the procedures and made Survey Domain. We haven’t done that at all, yet. Can you put on backlog and we discuss? Surveys will have negative answers.

t