I’ve noticed the concept relationship mappings for ‘OPCS4 - SNOMED’ have a lot of issues. The most common of which I’m seeing is that the OPCS4 concepts map to SNOMED concepts that have a much more vague meaning, causing a significant loss in detail. It’s a little odd because a good portion of the OPCS4 concepts have an equivalent SNOMED concept, often with the same concept name.
Some examples:
Format: (OPCS4 concept | SNOMED concept that its mapped to | SNOMED concept by joining on concept name)
I came across this after Usagi suggested a few terms get mapped to OPCS4 (correctly as they are standard concepts) but do we not need to leverage the mappings. Do other institutions use these relationships? Is this an issue folks plan to address? If the answer to the last two questions is no perhaps we should consider changing the status of the relationships to invalid?
Lastly, it seems odd to have concepts in different vocabularies with the same concept name, domain, and meaning and have them both be standard.
If that happens, I’d say it is wrong. The idea around the ‘standard’ concept is that there is the primary way to represent the clinical idea in data. It would be very strange to have 2 different concept IDs with the exact same name to both be standard. One of them should take the standard position, and all other concepts should map over to it.
Thanks for noticing. We already fixed it. Please check the Athena now. The release was done just after you posted.
There should be a joke about magical vocabulary team, who fixes the whole vocabulary less in 12 hours.
@aostropolets, can you please point out where the ongoing and planned vocabulary work is posted?
I’d suggest a smoke-test for this is to do a query looking for standard concepts that have the same name and same domain:
select concept_name, domain_id, count(*) dupes
FROM concept
where standard_concept = 'S' and invalid_reason is null
group by concept_name, domain_id
HAVING COUNT(*) > 1
If any records come back, then you’ve found multiple concepts with the same name in the same domain that are standard, and that should probably be investigated. I’ve run this on our latest download (which we labeled ‘Vocabulary_20180609’) and found 11727 occurrences. Here are the top 20:
concept_name domain_id dupes
Influenza Virus Vaccine, Inactivated A-California-07-2009 X-181 (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Victoria-210-2009 X-187 (H3N2) (A-Perth-16-2009) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008 Drug 19
Influenza Virus Vaccine, Inactivated A-Brisbane-59-2007, IVR-148 (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Uruguay-716-2007, NYMC X-175C (H3N2) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008 strain 0.03 Drug 17
Not treatment functions Provider Specialty 16
Nelaton catheter Device 16
Ensure Plus Liquid (Abbott Nutrition) Device 15
Ascorbic Acid 90 MG / Beta Carotene 3000 UNT / Biotin 0.045 MG / Calcium 162 MG / Cupric oxide 2 MG / Ferrous fumarate 10 MG / Folic Acid 0.4 MG / Magnesium Oxide 100 MG / Niacinamide 42.5 MG / Phosphorus 125 MG / ... Oral Tablet Drug 12
Patients who use hospice services any time during the measurement period Procedure 12
no Observation 12
Once or twice Observation 10
Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.03 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011 (H3N2)-like virus 0.03 MG/ML / Influenza B virus vaccine, B-Brisbane-60-2008-like virus 0.03 MG/ML / Influenza B virus vaccine, B- Drug 10
Influenza Virus Vaccine, Inactivated A-California-07-2009 X-179A (H1N1) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated A-Victoria-210-2009 X-187 (H3N2) (A-Perth-16-2009) strain 0.03 MG/ML / Influenza Virus Vaccine, Inactivated B-Brisbane-60-2008 Drug 10
Fortimel liq Liquid (Nutricia Ltd) Device 9
Country of birth Observation 9
Fortisip 200ml High energy food (Nutricia Ltd) Device 9
Ensure Plus 200ml Liquid (Abbott Nutrition) Device 9
Alanine / Arginine / Calcium Chloride / Dibasic potassium phosphate / Glucose / Glycine / Histidine / Isoleucine / Leucine / Lysine / Magnesium Chloride / Methionine / Phenylalanine / Proline / Serine / Sodium Acetate Trihydrate / Sodium Chloride / Threon Drug 9
Fortisip High energy food (Nutricia Ltd) Device 9
Ensure Liquid (Abbott Nutrition) Device 9
Alanine 21.7 MG/ML / Arginine 14.7 MG/ML / Aspartate 4.34 MG/ML / Glutamate 7.49 MG/ML / Glycine 10.4 MG/ML / Histidine 8.94 MG/ML / Isoleucine 7.49 MG/ML / Leucine 10.4 MG/ML / Lysine 11.8 MG/ML / Methionine 7.49 MG/ML / Phenylalanine 10.4 MG/ML / Prolin Drug 8
Programming device evaluation (in person) with iterative adjustment of the implantable device to test the function of the device and select optimal permanent programmed values with analysis, review and report by a physician or other qualified health care Procedure 8
select concept_name, domain_id, count(*) dupes
FROM concept
where standard_concept = 'S' and invalid_reason is null and LEN(concept_name) < 100 and domain_id <> 'Device'
group by concept_name, domain_id
HAVING COUNT(*) > 1
ORDER BY count(*) desc, domain_id
This still returns around 9,000 items, however when I looked for those records in our Observation, Procedure, Condition and Drug tables, none of those concepts were found on our very large large commercial claims database.
For those interested in how to check your own CDM:
with cteDupes as (
select concept_id, concept_name, domain_id, count(*) dupes
FROM concept
where standard_concept = 'S' and invalid_reason is null and LEN(concept_name) < 100 and domain_id <> 'Device'
group by concept_id, concept_name, domain_id
HAVING COUNT(*) > 1
),
cteConceptCounts as (
select drug_concept_id as concept_id, count(*) as record_count from drug_exposure group by drug_concept_id
UNION ALL
select procedure_concept_id as concept_id, count(*) as record_count from procedure_occurrence group by procedure_concept_id
UNION ALL
select condition_concept_id as concept_id, count(*) as record_count from condition_occurrence group by condition_concept_id
UNION ALL
select observation_concept_id as concept_id, count(*) as record_count from observation group by observation_concept_id
)
select d.concept_name, d.domain_id, d.dupes, c.record_count
FROM cteDupes d
JOIN cteConceptCounts c on d.concept_id = c.concept_id
Hahaha
So, HCPCS codes are back from the dead: old deprecated codes have become standard but their end dates represent the time they were buried in the real life. Snomed: mainly the synonym issue (more precise information is stored in synonyms); I recall a discussion about it…
And then, of course, you can’t judge simply based on names: State, for example, can be a location or a question (to be used with a number of answers within a vocab).
I bet you’ll create a new query to prove your point (like excluding HCPCS and adding group by concept_class_id). Won’t stop you
What I’m saying is that duplicates do exist, we’re fighting with them and even have QA ace up our sleeves.
I’m not trying to rub anyone’s noses in anything, just trying to help people find potential issues. You rightfully point out that my query is full of holes, and if there is a desire to try to identify these things heuristically, I’m happy to try to craft some SQL on it. But I get that you guys get that there are some cases and you just have to take them on as you find them.
Interesting thought about the concept class to distinguish the context of the terms, I’ll check that out for my own curiosity, I won’t do any more harassment on the thread
I did find it interesting that there’s like 6 or so concepts named ‘no’ and I’m wondering if there’s a multitude of meanings for ‘no’ in the vocabulary.
These ‘no’ are almost all from PPI survey vocabulary and are answers to the particular questions that are represented in their concept_codes. In this way it possible to distinguish between, for example, no, I’m not pregnant and no, I’m not smoking.
If they are from a survey they should be taken out of the procedures and made Survey Domain. We haven’t done that at all, yet. Can you put on backlog and we discuss? Surveys will have negative answers.