Missing concepts in the CONCEPT table (Athena Vocabulary)

Hello everyone! I’m transforming data from a database of inpatients at a high-complexity hospital in Chile. But I’m having trouble with the semantic mapping. I mapped more than 2,000 concepts, ensuring they were standard and valid and belonged to the domain = condition, using USAGI and Athena. I also loaded the updated vocabularies into the CDM. The problem is that many of the mapped concepts don’t appear among the loaded concepts (Athena vocabulary). It shows you a list of 10 concepts that don’t appear (there are many more).

Concept_id concept_description concept_code
22350 Edema of larynx 51599000
23653 Foreign body in esophagus 47609003
31317 Dysphagia 40739000
75128 Injury of chest wall 65978000
80182 Dermatomyositis 396230008
133002 Acute osteomyelitis 409780002
196455 Hepatorenal syndrome 51292008
199860 Hernia of abdominal cavity 52515009
201340 Gastritis 4556007
313217 Atrial fibrillation 49436004

Has this happened to any of you? How did you solve it?

Hello @bea-estrada, welcome to the community.

These 10 concepts from your example all belong to SNOMED vocabulary — did you make sure to include SNOMED when downloading vocabularies from Athena as CSV? If not, this could explain why they’re present online but not in the downloaded CSV files.
Another community member had a somewhat similar problem: Missing "maps to" values in ICD10 download

Hi @rookie_crewkie ! Yes, I did, more than 1 million SNOMED concepts were loaded into my database.

Hello @bea-estrada,

Good, thanks for confirming that. It is strange though that I’m getting a different number of SNOMED concepts (27-FEB-25 vocabulary version): 1,089,088, which is 81,569 more than in your case.

Let’s try comparing the distributions:

SELECT standard_concept, count(1) FROM concept
WHERE vocabulary_id = 'SNOMED' GROUP BY 1 ORDER BY 1;
/*
|standard_concept|_col1  |
|----------------|-------|
|S               |346,648|
|                |742,440|
*/
SELECT domain_id, count(1) FROM concept
WHERE vocabulary_id = 'SNOMED' GROUP BY 1 ORDER BY 1;
/*
|domain_id          |_col1  |
|-------------------|-------|
|Condition          |163,376|
|Device             |218,097|
|Drug               |254,991|
|Gender             |10     |
|Geography          |685    |
|Language           |878    |
|Meas Value         |5,122  |
|Meas Value Operator|7      |
|Measurement        |40,148 |
|Metadata           |2,378  |
|Observation        |268,976|
|Procedure          |84,582 |
|Provider           |708    |
|Race               |466    |
|Relationship       |405    |
|Route              |216    |
|Spec Anatomic Site |41,129 |
|Specimen           |2,094  |
|Type Concept       |3,395  |
|Unit               |1,362  |
|Visit              |63     |
*/
SELECT concept_id / 1000000 AS range_1M, count(1) FROM concept
WHERE vocabulary_id = 'SNOMED' GROUP BY 1 ORDER BY 1;
/*
|range_1M|_col1  |
|--------|-------|
|0       |41,156 |
|1       |9,685  |
|3       |148,542|
|4       |297,250|
|35      |24,956 |
|36      |36,107 |
|37      |95,355 |
|40      |85,706 |
|42      |11,693 |
|43      |2,797  |
|44      |28,667 |
|45      |60,038 |
|46      |247,136|
*/

If you spot the differences, it might make sense to reload the vocabularies from Athena.

@rookie_crewkie I also had several problems uploading it to the CDM. Is there any possibility that you could share the download link for your vocabulary with me? I would really appreciate it.

@bea-estrada,

What kind of problems did you have? Some records might’ve been skipped due to CSV parsing errors (quoting/delimiter issues, etc), although I don’t think that it’s the case for the concepts in your example.

Sorry, but it doesn’t work like that, the links are per account and are short-living anyway. It shouldn’t be too difficult though: create a new download in Athena, get the link to a zip with CSVs and upload them to your database alongside the current ones, so you can compare them.
Athena requires an account to download the vocabularies, but it’s open for registration and free for everyone.

@rookie_crewkie, I downloaded all the concepts last week, and when I tried to load the concepts table separated by \t it told me that the concept_name field exceeded 255 characters, in fact I had to enable up to 1000 characters for it to load without giving an error, when reviewing the document there are also inconsistencies with the delimiters, loading concepts without correctly separating the tables, especially associated with drugs. I will download the concepts again, verifying that the same number of Snomed concepts are loaded as yours. That should work fine. Thank you very much for responding.

@rookie_crewkie I finally solved it. Loading from pgAdmin doesn’t allow the correct loading. I did it from the console and everything is fine! Thanks so much for your help.

Glad it helped. Good luck with your project!