Dear CDM users,
A colleague and I are working on the same vocabulary query problem for the sake of validation; namely, finding all the descendant concepts of a list of master concept identifiers. For example, we want to list all the descendants of concept 757688 (drug Aripiprazole). Interestingly, we do not get the same results. After spending quite some time debugging my code I identified something that looks like a bug (or is it a feature?). I was wondering if you guys could comment on it?
My colleague’s approach is to list all descendants mentioned in the concept_ancestor
table with the following SQL query: http://pastebin.com/bMKqbzD6. This query yield 25 concepts.
My approach is to walk the hierarchy of concepts using a Python-based layer which generates the SQL queries. What I do is ask, for any parent concept, the list of the direct descendants by limiting max_levels_of_separation
to no more than 1. Then I repeat this step recursively for each descendant.
For the same parent concept as above, I find exactly zero descendants. Looking at the details it appears that no descendant concept has a min_levels_of_separation
below 2; the whole table is here: http://pastebin.com/bYCk6umh
My concern is that it means there is a missing link between the root concept and all the descendants listed in concept_ancestor
. Is there something I am not understanding? How could there be no direct descendant of a concept, but only indirect (i.e., at least twice removed) descendants?
Best,
Aurelien Mazurie