OHDSI Home | Forums | Wiki | Github

How to find the ancestor_concept_id of the condition_concept_id in the condition_occurrence table?

I wish to find the distribution of the disease in the condition_occurrence table, but it should be the distribution of the ancestor_concept_id according to clinical status. For clinical doctors, the ancestor_concept_ids such as bipolar disorder, major depressive disorder are the target disease they are interested in. But in the condition_occurrence table, the condition_concept_id contains the descendant_concept_id of the target disease, such as bipolar disorder, so I cannot directly count the number of bipolar disorder.
thus, if I want to count the number of bipolar disorder, how can I do that?
thank you very much


This is quite straight forward. All levels of disease can be rolled up to a higher level using concept_ancestor table which contains the disease hierarchy. For example, to get count for bipolar, you may want to following query:

select count(distinct person_id)
from condition_occurrence a join concept_ancestor b on a.condition_concept_id = b.descendant_concept_id
and ancestor_concept_id = 436665 - - Bipolar disorder

Please let me know if this answers your question.

thank you very much@QI_omop, I use this method to compute the number of bipolar disorder currently.
But this method means I firstly should have the ancestor_concept_id of bipolar disorder, then I can get the number of BP. I am wondering if it is possible for me to get the number of bipolar disorder without the ancesor_concept_id,while with the descendant_concept_id and concept_ancestor table, maybe including max or min levels_of_separation?

thank you very much!

@QI_omop hi,waiting for your suggestion, thank you very much

@pandamiao What is your end goal? Are you trying to characterize your condition_occurrence table by finding the distribution of the “main” conditions in the table (the “target diseases” as you call them)? Or are you looking for a specific set of conditions?


This is a problem we need to address. We need to have a way to characterize a patient population by reporting classes of diseases which are more general than “bipolar disease” and more granular than “Clinical finding”. This is a problem. It is further complicated by the fact that those summaries require statistical classification, i.e. each record is summarized only once. This requires a way to use the hierarchy with single parenthood only, which is not how SNOMED does it.

I’m planning a new repo. Stay tuned.

1 Like

Interesting stuff. Perhaps possible to leverage the ICD hierarchy in some way?

Precisely. We have one approach where we use the 10 ICD10 Chapters as categories, but the SNOMED hierarchy in combination with a precedence rule set to create them. Will put out soon.

We probably need a second one for a more granular list of maybe 50 disease groups.

Do you have a use case for that?

thank you @Christian_Reich, at this moment, I will finish the research according to the suggestion of QI.

Hi, thank you @katy-sadowski, I think I want to characterize my condition_occurrence table according to the ancestor concept id, just like the ingredient concept id of drug_exposure table.

@Christian_Reich Great; that’s exactly what I was picturing. I agree that a more-granular classification could be helpful for characterizing a population. I don’t have an active use case but have been pondering performing a characterization similar to what it sounds like @pandamiao is working on.

Do you think using the top-level 3-character ICD10 codes as the classes could work? By somehow reverse-mapping standard concepts back to those ICD categories? This assumes that it is possible to trace every standard condition concept to an ICD10 code (either directly or via the hierarchy), and that relationships between ICD10 codes are fully mapped out in the CDM, neither of which I’m sure about. Also, I think it’d still require hand-curated precedence rules, which might be infeasible due to the volume of codes.

Tricky problem :slight_smile:

We should try it out.

Almost. Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one). All their children would go into these bucket of concepts. Going back to ICD10 is a bad idea, since all mappings are unidirectional.

That’s pretty much the case. However, there are tons of SNOMEDs which have no equivalence in ICD10. ICD10 is smaller.

Precisely. And it would be arbitrary, because any one SNOMED legitimately could be a child of more than one ICD10-derived category. But we’d need to pick which category gets the higher precedence. Usually, this would be the less common one. We could try to derive that from the data.

I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.

Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one).

Aha, yes, this is much simpler as well. Good to know the reverse is inadvisable.

I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.

I’m in! I ran a couple of checks to get a sense of the scale. Among SNOMEDs that fit into an ICD10-derived category, 158/1431 map to multiple categories, which seems pretty reasonable to adjudicate by hand:

    count(distinct c.concept_id)
from omop.concept c
    left join omop.concept_relationship cr on cr.concept_id_1 = c.concept_id
	    and cr.relationship_id = 'Maps to'
    left join omop.concept c2 on c2.concept_id = cr.concept_id_2
where c.concept_class_id in ('3-char billing code', '3-char nonbill code')
    and c.domain_id = 'Condition'
    and c.vocabulary_id = 'ICD10CM'
    and c.invalid_reason is null
group by 1,2,3

However, the issue you mention about not all SNOMEDs mapping to ICD10 is more problematic (12150/156270 standard condition concepts don’t fit into one of the ICD10-derived categories; see query below). Making a manual mapping for this will be quite an undertaking…do you think there’s some way to leverage the SNOMED hierarchy to make this more manageable? One possible approach would be to do this mapping just for the codes that appear most frequently in one’s local condition_occurrence table - but of course that’s not ideal as it’s not generalizable. LMK your thoughts!


icd_categories as (
	select distinct 
	from omop.concept c1
		join omop.concept_relationship cr on cr.concept_id_1 = c1.concept_id
			and cr.relationship_id = 'Maps to'
		join omop.concept c2 on c2.concept_id = cr.concept_id_2
	where c1.concept_class_id in ('3-char billing code', '3-char nonbill code')
		and c1.vocabulary_id = 'ICD10CM'
        and c1.domain_id = 'Condition'
		and c1.invalid_reason is null

mapped as (
		distinct c.concept_id
	from omop.concept c
		join omop.concept_ancestor ca on ca.descendant_concept_id = c.concept_id
		join icd_categories on icd_categories.concept_id = ca.ancestor_concept_id

	mapped.concept_id is not null as maps_to_icd10_category,
	count(distinct c.concept_id)
from omop.concept c
	left join mapped on mapped.concept_id = c.concept_id
where c.domain_id = 'Condition'
	and c.standard_concept = 'S'
	and c.invalid_reason is null
group by 1

FYI - my counts may be a bit off as we’re a few vocab releases behind.

Wondering if a repo or best practices guide ever came out of this discussion? In particular, the recommended way to get all patients of a certain condition where that condition might be a very general one (e.g. cancer) or more specific (brain cancer) or even more specific (GBM stage 4). Thanks!