I wish to find the distribution of the disease in the condition_occurrence table, but it should be the distribution of the ancestor_concept_id according to clinical status. For clinical doctors, the ancestor_concept_ids such as bipolar disorder, major depressive disorder are the target disease they are interested in. But in the condition_occurrence table, the condition_concept_id contains the descendant_concept_id of the target disease, such as bipolar disorder, so I cannot directly count the number of bipolar disorder.
thus, if I want to count the number of bipolar disorder, how can I do that?
thank you very much
This is quite straight forward. All levels of disease can be rolled up to a higher level using concept_ancestor table which contains the disease hierarchy. For example, to get count for bipolar, you may want to following query:
select count(distinct person_id)
from condition_occurrence a join concept_ancestor b on a.condition_concept_id = b.descendant_concept_id
and ancestor_concept_id = 436665 - - Bipolar disorder
Please let me know if this answers your question.
thank you very much@QI_omop, I use this method to compute the number of bipolar disorder currently.
But this method means I firstly should have the ancestor_concept_id of bipolar disorder, then I can get the number of BP. I am wondering if it is possible for me to get the number of bipolar disorder without the ancesor_concept_id，while with the descendant_concept_id and concept_ancestor table, maybe including max or min levels_of_separation?
thank you very much!
@QI_omop hi，waiting for your suggestion， thank you very much
@pandamiao What is your end goal? Are you trying to characterize your
condition_occurrence table by finding the distribution of the “main” conditions in the table (the “target diseases” as you call them)? Or are you looking for a specific set of conditions?
This is a problem we need to address. We need to have a way to characterize a patient population by reporting classes of diseases which are more general than “bipolar disease” and more granular than “Clinical finding”. This is a problem. It is further complicated by the fact that those summaries require statistical classification, i.e. each record is summarized only once. This requires a way to use the hierarchy with single parenthood only, which is not how SNOMED does it.
I’m planning a new repo. Stay tuned.
Interesting stuff. Perhaps possible to leverage the ICD hierarchy in some way?
Precisely. We have one approach where we use the 10 ICD10 Chapters as categories, but the SNOMED hierarchy in combination with a precedence rule set to create them. Will put out soon.
We probably need a second one for a more granular list of maybe 50 disease groups.
Do you have a use case for that?
thank you @Christian_Reich, at this moment, I will finish the research according to the suggestion of QI.
Hi, thank you @katy-sadowski, I think I want to characterize my condition_occurrence table according to the ancestor concept id, just like the ingredient concept id of drug_exposure table.
@Christian_Reich Great; that’s exactly what I was picturing. I agree that a more-granular classification could be helpful for characterizing a population. I don’t have an active use case but have been pondering performing a characterization similar to what it sounds like @pandamiao is working on.
Do you think using the top-level 3-character ICD10 codes as the classes could work? By somehow reverse-mapping standard concepts back to those ICD categories? This assumes that it is possible to trace every standard condition concept to an ICD10 code (either directly or via the hierarchy), and that relationships between ICD10 codes are fully mapped out in the CDM, neither of which I’m sure about. Also, I think it’d still require hand-curated precedence rules, which might be infeasible due to the volume of codes.
We should try it out.
Almost. Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one). All their children would go into these bucket of concepts. Going back to ICD10 is a bad idea, since all mappings are unidirectional.
That’s pretty much the case. However, there are tons of SNOMEDs which have no equivalence in ICD10. ICD10 is smaller.
Precisely. And it would be arbitrary, because any one SNOMED legitimately could be a child of more than one ICD10-derived category. But we’d need to pick which category gets the higher precedence. Usually, this would be the less common one. We could try to derive that from the data.
I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.
Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one).
Aha, yes, this is much simpler as well. Good to know the reverse is inadvisable.
I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.
I’m in! I ran a couple of checks to get a sense of the scale. Among SNOMEDs that fit into an ICD10-derived category, 158/1431 map to multiple categories, which seems pretty reasonable to adjudicate by hand:
select c2.concept_id, c2.concept_name, c2.concept_code, count(distinct c.concept_id) from omop.concept c left join omop.concept_relationship cr on cr.concept_id_1 = c.concept_id and cr.relationship_id = 'Maps to' left join omop.concept c2 on c2.concept_id = cr.concept_id_2 where c.concept_class_id in ('3-char billing code', '3-char nonbill code') and c.domain_id = 'Condition' and c.vocabulary_id = 'ICD10CM' and c.invalid_reason is null group by 1,2,3
However, the issue you mention about not all SNOMEDs mapping to ICD10 is more problematic (12150/156270 standard condition concepts don’t fit into one of the ICD10-derived categories; see query below). Making a manual mapping for this will be quite an undertaking…do you think there’s some way to leverage the SNOMED hierarchy to make this more manageable? One possible approach would be to do this mapping just for the codes that appear most frequently in one’s local
condition_occurrence table - but of course that’s not ideal as it’s not generalizable. LMK your thoughts!
with icd_categories as ( select distinct c2.concept_id from omop.concept c1 join omop.concept_relationship cr on cr.concept_id_1 = c1.concept_id and cr.relationship_id = 'Maps to' join omop.concept c2 on c2.concept_id = cr.concept_id_2 where c1.concept_class_id in ('3-char billing code', '3-char nonbill code') and c1.vocabulary_id = 'ICD10CM' and c1.domain_id = 'Condition' and c1.invalid_reason is null ), mapped as ( select distinct c.concept_id from omop.concept c join omop.concept_ancestor ca on ca.descendant_concept_id = c.concept_id join icd_categories on icd_categories.concept_id = ca.ancestor_concept_id ) select mapped.concept_id is not null as maps_to_icd10_category, count(distinct c.concept_id) from omop.concept c left join mapped on mapped.concept_id = c.concept_id where c.domain_id = 'Condition' and c.standard_concept = 'S' and c.invalid_reason is null group by 1
FYI - my counts may be a bit off as we’re a few vocab releases behind.
Wondering if a repo or best practices guide ever came out of this discussion? In particular, the recommended way to get all patients of a certain condition where that condition might be a very general one (e.g. cancer) or more specific (brain cancer) or even more specific (GBM stage 4). Thanks!
Did a solution come out of this conversation?
We have a solution for the 20 ICD10-like top disease groups. This creates a statistical category (meaning no double membership, everything adds up to 100%) for each input concept_id:
with disease as ( -- define disease categories similar to ICD10 C select 1 as precedence, 'Blood disease' as disease_category, 440371 as snomed_rollup union select 1, 'Blood disease', 443723 union select 2, 'Injury and poisoning', 432795 union select 2, 'Injury and poisoning', 442562 union select 2, 'Injury and poisoning', 444363 union select 3, 'Congenital disease', 440508 union select 4, 'Pregnancy or childbirth disease', 435875 union select 4, 'Pregnancy or childbirth disease', 4088927 union select 4, 'Pregnancy or childbirth disease', 4154314 union select 4, 'Pregnancy or childbirth disease', 4136529 union select 5, 'Perinatal disease', 441406 union select 6, 'Infection', 432250 union select 7, 'Neoplasm', 4266186 union select 8, 'Endocrine or metabolic disease', 31821 union select 8, 'Endocrine or metabolic disease', 4090739 union select 8, 'Endocrine or metabolic disease', 436670 union select 9, 'Mental disease', 432586 union select 9, 'Mental disease', 4023059 union select 9, 'Mental disease', 4293175 union select 10, 'Nerve disease and pain', 376337 union select 10, 'Nerve disease and pain', 4011630 union select 11, 'Eye disease', 4038502 union select 12, 'ENT disease', 4042836 union select 13, 'Cardiovascular disease', 134057 union select 14, 'Respiratory disease', 320136 union select 14, 'Respiratory disease', 4115386 union select 15, 'Digestive disease', 4302537 union select 16, 'Skin disease', 4028387 union select 17, 'Soft tissue or bone disease', 4244662 union select 17, 'Soft tissue or bone disease', 433595 union select 17, 'Soft tissue or bone disease', 4344497 union select 17, 'Soft tissue or bone disease', 40482430 union select 17, 'Soft tissue or bone disease', 4027384 union select 18, 'Genitourinary disease', 4041285 union select 19, 'Iatrogenic condition', 4105886 union select 19, 'Iatrogenic condition', 4053838 union select 19, 'Iatrogenic condition', 444199 union select 20, 'Not categorized', 441840 ) select distinct -- get the disease category with the lowest (best fitting) precedence, or assign 'Other Condition' concept_id, concept_name, -- first_value(coalesce(disease_id, 0)) over (partition by concept_id order by precedence nulls last) as disease_id, first_value(coalesce(disease_category, 'Other Condition')) over (partition by concept_id order by precedence nulls last) as disease_category from concept left join ( -- find the approprate disease category, if possible select descendant_concept_id, snomed_rollup, disease_category, precedence from concept_ancestor join disease on ancestor_concept_id=snomed_rollup ) d on descendant_concept_id=concept_id where concept_id in (4001903, 22856, 40482052, 4275588) -- place here the concept_ids you want to roll up (have to be standard SNOMED) ;
We were also thinking of a more fine-grained categorizer, with approximately 100 categories. This has not been done yet.
The finer grained categories would be very useful Let me know when you get to it.
When using the above should we further aggregate just by the disease_name since there are multiple disease_id with identical disease_names? are were these intended to be separate categories? I’m guessing the former since it adds up to 20, 19 disease_names + other. But wanted to be sure I am using this correctly. Thanks
You are correct, that is confusing. I changed the script. The disease_name and disease_id sound like they are equivalent, but they are not. The former is the category (does not exist as a concept), and the latter is a rollup concept in SNOMED. Each category can have one or more of them, and the precedence may or may not be the same. Take the script as it stands now.
Yes, it’s only 19. No idea how that happened, I probably merged a category that was hard to tease apart. The “Other” - not sure there are any. Let me find out.
The 100 categories: Happy to create an equivalent one (and put both into a Github repo), but do you know of a good starting point?
Checked out the conditions that are falling through the cracks. Looks like a list that should: Complications (without knowing what), very generic conditions (disorder of body cavity), signs and symptoms, abnormal measurements. Added a few to the categories and added a “non-categorized” category in the updated the script above.