Roll-up of for Table 1 type characterization of conditions: 20 clinical categories, similar to ICD10 chapters

pandamiao · October 29, 2022, 7:18pm

dear,
I wish to find the distribution of the disease in the condition_occurrence table, but it should be the distribution of the ancestor_concept_id according to clinical status. For clinical doctors, the ancestor_concept_ids such as bipolar disorder, major depressive disorder are the target disease they are interested in. But in the condition_occurrence table, the condition_concept_id contains the descendant_concept_id of the target disease, such as bipolar disorder, so I cannot directly count the number of bipolar disorder.
thus, if I want to count the number of bipolar disorder, how can I do that?
thank you very much

QI_omop · June 8, 2020, 7:04am

@pandamiao

This is quite straight forward. All levels of disease can be rolled up to a higher level using concept_ancestor table which contains the disease hierarchy. For example, to get count for bipolar, you may want to following query:

select count(distinct person_id)
from condition_occurrence a join concept_ancestor b on a.condition_concept_id = b.descendant_concept_id
and ancestor_concept_id = 436665 - - Bipolar disorder

Please let me know if this answers your question.

pandamiao · June 8, 2020, 9:32am

thank you very much@QI_omop, I use this method to compute the number of bipolar disorder currently.
But this method means I firstly should have the ancestor_concept_id of bipolar disorder, then I can get the number of BP. I am wondering if it is possible for me to get the number of bipolar disorder without the ancesor_concept_id，while with the descendant_concept_id and concept_ancestor table, maybe including max or min levels_of_separation?

thank you very much!

pandamiao · June 10, 2020, 2:21am

@QI_omop hi，waiting for your suggestion， thank you very much

katy-sadowski · June 10, 2020, 1:13pm

@pandamiao What is your end goal? Are you trying to characterize your condition_occurrence table by finding the distribution of the “main” conditions in the table (the “target diseases” as you call them)? Or are you looking for a specific set of conditions?

Christian_Reich · June 10, 2020, 4:38pm

@katy-sadowski:

This is a problem we need to address. We need to have a way to characterize a patient population by reporting classes of diseases which are more general than “bipolar disease” and more granular than “Clinical finding”. This is a problem. It is further complicated by the fact that those summaries require statistical classification, i.e. each record is summarized only once. This requires a way to use the hierarchy with single parenthood only, which is not how SNOMED does it.

I’m planning a new repo. Stay tuned.

katy-sadowski · June 10, 2020, 6:31pm

Interesting stuff. Perhaps possible to leverage the ICD hierarchy in some way?

Christian_Reich · June 10, 2020, 8:14pm

Precisely. We have one approach where we use the 10 ICD10 Chapters as categories, but the SNOMED hierarchy in combination with a precedence rule set to create them. Will put out soon.

We probably need a second one for a more granular list of maybe 50 disease groups.

Do you have a use case for that?

pandamiao · June 12, 2020, 8:55am

thank you @Christian_Reich, at this moment, I will finish the research according to the suggestion of QI.

pandamiao · June 12, 2020, 8:58am

Hi, thank you @katy-sadowski, I think I want to characterize my condition_occurrence table according to the ancestor concept id, just like the ingredient concept id of drug_exposure table.

katy-sadowski · June 17, 2020, 1:58pm

@Christian_Reich Great; that’s exactly what I was picturing. I agree that a more-granular classification could be helpful for characterizing a population. I don’t have an active use case but have been pondering performing a characterization similar to what it sounds like @pandamiao is working on.

Do you think using the top-level 3-character ICD10 codes as the classes could work? By somehow reverse-mapping standard concepts back to those ICD categories? This assumes that it is possible to trace every standard condition concept to an ICD10 code (either directly or via the hierarchy), and that relationships between ICD10 codes are fully mapped out in the CDM, neither of which I’m sure about. Also, I think it’d still require hand-curated precedence rules, which might be infeasible due to the volume of codes.

Tricky problem

Christian_Reich · June 27, 2020, 11:17pm

We should try it out.

Almost. Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one). All their children would go into these bucket of concepts. Going back to ICD10 is a bad idea, since all mappings are unidirectional.

That’s pretty much the case. However, there are tons of SNOMEDs which have no equivalence in ICD10. ICD10 is smaller.

Precisely. And it would be arbitrary, because any one SNOMED legitimately could be a child of more than one ICD10-derived category. But we’d need to pick which category gets the higher precedence. Usually, this would be the less common one. We could try to derive that from the data.

I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.

katy-sadowski · June 30, 2020, 12:37am

Forward mapping to SNOMED. The way it would work is that we take a top-level ICD10, follow the mapping to the SNOMEDs (could be more than one).

Aha, yes, this is much simpler as well. Good to know the reverse is inadvisable.

I agree. Want to give it a shot together? The result would be a pretty useful automatic “Table 1” generation package.

I’m in! I ran a couple of checks to get a sense of the scale. Among SNOMEDs that fit into an ICD10-derived category, 158/1431 map to multiple categories, which seems pretty reasonable to adjudicate by hand:

select 
    c2.concept_id,
    c2.concept_name,
    c2.concept_code,
    count(distinct c.concept_id)
from omop.concept c
    left join omop.concept_relationship cr on cr.concept_id_1 = c.concept_id
	    and cr.relationship_id = 'Maps to'
    left join omop.concept c2 on c2.concept_id = cr.concept_id_2
where c.concept_class_id in ('3-char billing code', '3-char nonbill code')
    and c.domain_id = 'Condition'
    and c.vocabulary_id = 'ICD10CM'
    and c.invalid_reason is null
group by 1,2,3

However, the issue you mention about not all SNOMEDs mapping to ICD10 is more problematic (12150/156270 standard condition concepts don’t fit into one of the ICD10-derived categories; see query below). Making a manual mapping for this will be quite an undertaking…do you think there’s some way to leverage the SNOMED hierarchy to make this more manageable? One possible approach would be to do this mapping just for the codes that appear most frequently in one’s local condition_occurrence table - but of course that’s not ideal as it’s not generalizable. LMK your thoughts!

with 

icd_categories as (
	select distinct 
		c2.concept_id
	from omop.concept c1
		join omop.concept_relationship cr on cr.concept_id_1 = c1.concept_id
			and cr.relationship_id = 'Maps to'
		join omop.concept c2 on c2.concept_id = cr.concept_id_2
	where c1.concept_class_id in ('3-char billing code', '3-char nonbill code')
		and c1.vocabulary_id = 'ICD10CM'
        and c1.domain_id = 'Condition'
		and c1.invalid_reason is null
),

mapped as (
	select 
		distinct c.concept_id
	from omop.concept c
		join omop.concept_ancestor ca on ca.descendant_concept_id = c.concept_id
		join icd_categories on icd_categories.concept_id = ca.ancestor_concept_id
)

select
	mapped.concept_id is not null as maps_to_icd10_category,
	count(distinct c.concept_id)
from omop.concept c
	left join mapped on mapped.concept_id = c.concept_id
where c.domain_id = 'Condition'
	and c.standard_concept = 'S'
	and c.invalid_reason is null
group by 1

FYI - my counts may be a bit off as we’re a few vocab releases behind.

David_Craft · July 11, 2022, 5:31pm

Wondering if a repo or best practices guide ever came out of this discussion? In particular, the recommended way to get all patients of a certain condition where that condition might be a very general one (e.g. cancer) or more specific (brain cancer) or even more specific (GBM stage 4). Thanks!

namuwen · October 27, 2022, 12:36am

Did a solution come out of this conversation?

Christian_Reich · October 29, 2022, 7:07pm

We have a solution for the 20 ICD10-like top disease groups. This creates a statistical category (meaning no double membership, everything adds up to 100%) for each input concept_id:

 with disease as ( -- define disease categories similar to ICD10 C
  select 1 as precedence, 'Blood disease' as disease_category, 440371 as snomed_rollup union
  select 1, 'Blood disease', 443723 union
  select 2, 'Injury and poisoning', 432795 union
  select 2, 'Injury and poisoning', 442562 union
  select 2, 'Injury and poisoning', 444363 union
  select 3, 'Congenital disease', 440508 union
  select 4, 'Pregnancy or childbirth disease', 435875 union
  select 4, 'Pregnancy or childbirth disease', 4088927 union
  select 4, 'Pregnancy or childbirth disease', 4154314 union
  select 4, 'Pregnancy or childbirth disease', 4136529 union
  select 5, 'Perinatal disease', 441406 union
  select 6, 'Infection', 432250 union
  select 7, 'Neoplasm', 4266186 union
  select 8, 'Endocrine or metabolic disease', 31821 union
  select 8, 'Endocrine or metabolic disease', 4090739 union
  select 8, 'Endocrine or metabolic disease', 436670 union
  select 9, 'Mental disease', 432586 union
  select 9, 'Mental disease', 4023059 union
  select 9, 'Mental disease', 4293175 union
  select 10, 'Nerve disease and pain', 376337 union
  select 10, 'Nerve disease and pain', 4011630 union
  select 11, 'Eye disease', 4038502 union
  select 12, 'ENT disease', 4042836 union
  select 13, 'Cardiovascular disease', 134057 union
  select 14, 'Respiratory disease', 320136 union
  select 14, 'Respiratory disease', 4115386 union
  select 15, 'Digestive disease', 4302537 union
  select 16, 'Skin disease', 4028387 union
  select 17, 'Soft tissue or bone disease', 4244662 union
  select 17, 'Soft tissue or bone disease', 433595 union
  select 17, 'Soft tissue or bone disease', 4344497 union
  select 17, 'Soft tissue or bone disease', 40482430 union
  select 17, 'Soft tissue or bone disease', 4027384 union
  select 18, 'Genitourinary disease', 4041285 union
  select 19, 'Iatrogenic condition', 4105886 union
  select 19, 'Iatrogenic condition', 4053838 union
  select 19, 'Iatrogenic condition', 444199 union
  select 20, 'Not categorized', 441840
)
select distinct -- get the disease category with the lowest (best fitting) precedence, or assign 'Other Condition'
  concept_id, concept_name,
--   first_value(coalesce(disease_id, 0)) over (partition by concept_id order by precedence nulls last) as disease_id, 
  first_value(coalesce(disease_category, 'Other Condition')) over (partition by concept_id order by precedence nulls last) as disease_category
from concept
left join ( -- find the approprate disease category, if possible
  select descendant_concept_id, snomed_rollup, disease_category, precedence
  from concept_ancestor 
  join disease on ancestor_concept_id=snomed_rollup
) d on descendant_concept_id=concept_id
where concept_id in (4001903, 22856, 40482052, 4275588) -- place here the concept_ids you want to roll up (have to be standard SNOMED)
;

We were also thinking of a more fine-grained categorizer, with approximately 100 categories. This has not been done yet.

namuwen · October 28, 2022, 10:17pm

The finer grained categories would be very useful Let me know when you get to it.

namuwen · October 28, 2022, 11:08pm

When using the above should we further aggregate just by the disease_name since there are multiple disease_id with identical disease_names? are were these intended to be separate categories? I’m guessing the former since it adds up to 20, 19 disease_names + other. But wanted to be sure I am using this correctly. Thanks

Christian_Reich · October 29, 2022, 4:21pm

@namuwen:

You are correct, that is confusing. I changed the script. The disease_name and disease_id sound like they are equivalent, but they are not. The former is the category (does not exist as a concept), and the latter is a rollup concept in SNOMED. Each category can have one or more of them, and the precedence may or may not be the same. Take the script as it stands now.

Yes, it’s only 19. No idea how that happened, I probably merged a category that was hard to tease apart. The “Other” - not sure there are any. Let me find out.

The 100 categories: Happy to create an equivalent one (and put both into a Github repo), but do you know of a good starting point?

Christian_Reich · October 29, 2022, 7:09pm

Checked out the conditions that are falling through the cracks. Looks like a list that should: Complications (without knowing what), very generic conditions (disorder of body cavity), signs and symptoms, abnormal measurements. Added a few to the categories and added a “non-categorized” category in the updated the script above.