OHDSI Home | Forums | Wiki | Github

Inconsistent entries in concept_ancestor table

Dear CDM users,
A colleague and I are working on the same vocabulary query problem for the sake of validation; namely, finding all the descendant concepts of a list of master concept identifiers. For example, we want to list all the descendants of concept 757688 (drug Aripiprazole). Interestingly, we do not get the same results. After spending quite some time debugging my code I identified something that looks like a bug (or is it a feature?). I was wondering if you guys could comment on it?

My colleague’s approach is to list all descendants mentioned in the concept_ancestor table with the following SQL query: http://pastebin.com/bMKqbzD6. This query yield 25 concepts.

My approach is to walk the hierarchy of concepts using a Python-based layer which generates the SQL queries. What I do is ask, for any parent concept, the list of the direct descendants by limiting max_levels_of_separation to no more than 1. Then I repeat this step recursively for each descendant.

For the same parent concept as above, I find exactly zero descendants. Looking at the details it appears that no descendant concept has a min_levels_of_separation below 2; the whole table is here: http://pastebin.com/bYCk6umh

My concern is that it means there is a missing link between the root concept and all the descendants listed in concept_ancestor. Is there something I am not understanding? How could there be no direct descendant of a concept, but only indirect (i.e., at least twice removed) descendants?

Best,
Aurelien Mazurie

Aurélian:

Wow. Your in-depth checking is truly appreciated.

Here is the cause of your discrepancy: The ancestor constructor collects all node-edge-node combinations where the edge is a relationship with the flag “defines_ancestry” set. These are essentially ‘Is a’ relationships and relationships to walk from one vocabulary to another, to create classes. Then, it builds the tree in a gigantic puzzle.

Now comes the reason for your problem: After that, all nodes are removed that are no standard_vocabulary=‘S’. Which means, if you are a non-standard concept you can be used as an intermediary in the tree, but not as a beginning and end node (ancestor_concept_id and descendant_concept_id). But the steps (min_levels_of_separation) are still counted. We want to review the counting of steps, but right now we haven’t had time.

The code is here: https://github.com/OHDSI/Vocabulary-v5.0/tree/master/Final_Assembly/pkg_concept_ancestor.pck.

Try your test without the min_levels_of_separation=1 for your iterative cycles.

Let me know. Would be interested to see examples where we made a mistake.

Christian.

I’d like to add to this thread because I think this question is related however I’m unsure if @Christian_Reich’s response provided me the answer I’m looking for.

SELECT DISTINCT c2.*
FROM CONCEPT c
	JOIN CONCEPT_ANCESTOR ca
		ON ca.ANCESTOR_CONCEPT_ID = c.CONCEPT_ID
	JOIN CONCEPT c2
		ON c2.CONCEPT_ID = ca.DESCENDANT_CONCEPT_ID
		AND c2.VOCABULARY_ID = 'RxNorm'
WHERE c.CONCEPT_ID = 1525215
ORDER BY CONCEPT_NAME

This code in the Vocab v5 provides different results from Vocab v4. Namely the original concept 1525215 (the ingredient pioglitazone) is not included in the results. I have always assumed that a concept used in the query of the CONCEPT_ANCESTOR is also returned. In addition there were 35 results in the old vocab versus 17 in the v5 Vocab.

Saw the problem, will investigate. Thanks Erica.

Hi Christian, fyi, I’m finding that records are missing from concept_ancestor. For example:

$ grep “^3800361[4-6],” CONCEPT.csv
38003614,European,Race,Race,Race,S,5.01.,19700101,20991231,
38003615,Middle Eastern or North African,Race,Race,Race,S,5.02.,19700101,20991231,
38003616,Arab,Race,Race,Race,S,5.03.,19700101,20991231,

$ grep “,3800361[4-6],” CONCEPT_RELATIONSHIP.csv | grep Sub
8527,38003614,Subsumes,19800101,20991231,
8527,38003616,Subsumes,19800101,20991231,
8527,38003615,Subsumes,19800101,20991231,

$ grep “3800361[4-6]” CONCEPT_ANCESTOR.csv | sort
38003614,38003614,0,0
38003615,38003615,0,0
8527,38003614,1,1
8527,38003615,1,1

Why is standard concept 38003616 missing from CONCEPT_ANCESTOR.csv?

Cristyn:

No idea. Will debug. Thanks for the hint.

Fixed in the next release.

t