OHDSI Home | Forums | Wiki | Github

Roll-up of for Table 1 type characterization of conditions: 20 clinical categories, similar to ICD10 chapters

Instead of taking standardized data and organizing it by a non-standard vocabulary hierarchy, could we identify what are the 20-100 clinical ideas that we’d want to have in a “Table 1 type characterization” and phenotype them properly? I thought @agolozar was trying to make progress on that, but don’t know the status of it.

This would definitely be a useful resource. Happy to help

Phenotypes? I am confused. Phenotypes usually are well defined conditions that you use in lieu of Condition concepts in CONDITION_OCCURRENCE. You spend all that effort to overcome the their shortcomings to have the best outcomes for your study.

Here, we are talking categories for co-morbidity reporting of cohorts. Yes, this script does a quick and dirty job, but probably good enough for the task. The list is based on slightly modified ICD10 Chapters.

If you want to do those as phenotype definitions - I don’t even know how you would do that any other way – using the SNOMED hierarchy to combine Condition concepts plus an order of precedence so that each Condition gets placed into one category only. There are tens of thousands of individual conditions. Do you want to run them all through the @Gowtham_Rao program? I am open to ideas.

Happy Saturday night @Christian_Reich

This is some work done there

We are looking for volunteers to take this thru our peer review process

Exactly. These are individual conditions, or small combinations of a few of them. That is probably useful for some questions. But again, there are thousands of such conditions. You will need a ton of volunteers to cover the space.

The categories I created are meant for co-morbidity characterization across all of medicine. The list of 20 is very broad. Hence the idea to go down a level and create a similar mechanism with some more granularity.

It’s a complex problem @Christian_Reich

My concern is the proposal to use it in table 1. I think we need to use cohort definitions after understanding the measurement error. But this is very difficult and labor intensive, as you said …but if the list is finite we can do it.

your process sounds simple and is implementable… it’s probably good enough to get a rough estimate of characteristics. It has value.

But we have already defined the finite list that we think should go into table 1. We have already phenotyped most of them. It only needs to go thru the peer review process

So this cohort based solution is within reach and can be completed this year …so why not focus our collective energy on that idea?

@Christian_Reich , my proposal is simple. Can we just enumerate a list of the 20 - 100 clinical ideas that we want to have in a Table 1 to characterize ‘all of medicine’ as you describe it. If we had that list of target ideas, then we could work in parellel to consider whether there is a ontology-based solution that can provide a sufficient approach and we can also just directly phenotype them, as @Gowtham_Rao provided that link to with the other effort.

For what its worth, I think there’d be A LOT of value in simply enumerating phenotypes that we want. Here, we’re largely talking about common comorbidities, and that’s certainly a good list. Separately, we need to enumerate the list of outcomes that we want to be able to do safety surveillance and comparative effectiveness, including what @hripcsa was proposing with howoften.org. We also would like to have a list of indications that we want to march through, as we’ve done LEGEND for hypertension and T2DM. Separate from that, we need to enumerate the set of covariates/features that we’d want to consider for patient-level predictive modeling, as @jennareps has asked for in the past. Now, these lists will likely have overlaps, but they’ll also likely have items that are more relevant for one use case than another. But if we could define some universe (even if just an initial starting point), then we’d at least have a target to work toward.

As a starting point, I’ll just remind folks about the clinical ideas that we currently include in ‘Table 1’ from the standard output of CohortMethod. @schuemie did a great job of implementing a solution that could have taken any list of concepts (+descendants), but if you gotta beef with the list, that’s 100% on me, because I was the one that picked them, based on reviewing Table 1’s from a collection of published papers and trying to make an intersection list, and then traversing SNOMED (for conditions) and ATC (for drugs) to determine if I could find a concept that was a ‘good enough’ approximate for the clinical idea of interest. We definitely created this list as a starting point strawman, didn’t intend it to become a de facto standard, but we haven’t seen anyone suggest other ideas. And yet, when we go through this clinical concepts, we already know for many of them that using concept+descendant is quite problematic (from Phenotype Phebruary, you can see the issues with ADHD, diabetes, depression), so could be improved by replacing the concept-based approach with a proper phenotype. But we can’t phenotype what we don’t define, so maybe getting the community’s input on the list of target ideas would move this conversation forward.

  • Demographics:
    – Age group (5-year buckets)
    – sex
    –(probably should also include race, ethnicity, and index year)
  • Medical history (conditions):
    – Acute resiratory disease
    – Attention deficit hyperactivity disease
    – Chronic liver disease
    – Chronic obstructive lung disease
    – Crohn’s disease
    – Dementia
    – Depressive disorder
    – Diabetes mellitus
    – Gastroesophageal reflux disease
    – HIV infection
    – Hyperlipidemia
    – Hypertensive disorder
    – Lesion of liver
    – Obesity
    – Osteoarthritis
    – Pneumonia
    – Psoriasis
    – Renal impairment
    – Rheumatoid arthritis
    – Schizophrenia
    – Ulcerative colitis
    – Urinary tract infectious disorder
    – Viral hepatitis C
    – Visual system disorder
  • Medical history (cardiovascular disease):
    – Atrial fibrillation
    – Cerebrovascular disease
    – Coronary arteriosclerosis
    – Heart disease
    – Heart failure
    – Ischemic heart disease
    – Peripheral vascular disease
    – Pulmonary embolism
    – Venous thrombosis
  • Medical history (neoplasms):
    – Hematologic neoplasm
    – Malignant lymphoma
    – Maligant neoplasm of abdomen
    – Malignant neoplastic disease
    – Malignant tumor of breast
    – Malignant tumor of colon
    – Malignant tumor of lung
    – Malignant tumor of urinary bladder
    – Primary malignant neoplasm of prostate
  • Medication use
    – Antibacterials for systemic use
    – Antidepressants
    – Antiepileptics
    – Antiinflammatory and antirheumatic products
    – Antineoplastic agents
    – Antipsoriatics
    – Antithrombotic agents
    – Beta blocking agents
    – Calcium channel blockers
    – Diuretics
    – Drugs for acid related disorders
    – Drugs for obstructive airway diseases
    – Drugs used in diabetes
    – Immunosuppressants
    – Lipid modifying agents
    – Opioids
    – Psycholeptics
    – Psychostimulants, agents used for adhd and nootropics

Happy to help. What do have to do?


Looks like I need to back out a bit here of a debate that wasn’t the intention. Or better yet, jump into that one as well. There is nothing better than a 2-front war. :slight_smile:

Debate #1: Characterize the conditions of a population. That’s what @pandamiao started this debate with. Skip this if you are interested in the Table 1 discussion.

This solves the following problem: If you have a bunch of patients and you want to summarize what diseases they have. Since there are many of them, you need categories. The above script puts each Condition concept into one of 20 clinically meaningful statistical categories. The categories are:

  • Blood disease
  • Injury and poisoning
  • Congenital disease
  • Pregnancy or childbirth disease
  • Perinatal disease
  • Infection
  • Neoplasm
  • Endocrine or metabolic disease
  • Mental disease
  • Nerve disease and pain
  • Eye disease
  • ENT disease
  • Cardiovascular disease
  • Respiratory disease
  • Digestive disease
  • Skin disease
  • Soft tissue or bone disease
  • Genitourinary disease
  • Iatrogenic condition
  • Not categorized

These categories are very similar, but not exactly the same as the 22 ICD-10 Chapters. The differences are threefold:

If you want to use the script, join the concept_id of the main select to the condition_concept_id of your CONDITION_OCCURRENCE table and count up the different category_name occurrences. The total will be equal to the total rows in the table, so the % add up to 100.

The next step is to build a similar script, but with about 100 finer grained categories. Otherwise same idea.

Thanks @Christian_Reich , this is really helpful. If one is looking to group conditions into very high-level categories, then the 20 you list here make sense and I agree with you that an ontology-based approach is the preferred approach, because these aren’t really distinct, well-defined, clinical ideas, but rather broad buckets to classify conditions (and there’s nothing inherently ‘right’ or ‘wrong’ about the classification, it is what it is). I don’t think that we could or should try to phenotype broad-based categories like this.

Now, when we start going the next level down, from 20 → 100, I think it’ll be interesting to figure out if the ontology is sufficient or if that’s when we cross into phenotyping range. But, we need to look at what those 100 ideas are to make that determination.

Thank you, @Christian_Reich and @Patrick_Ryan for describing your perspectives. I was also confused when the idea of phenotyping was brought up when I think what @Christian_Reich was trying to get to was a category of disease system and not a set of clinical ideas that would populate a Table 1 report.

But, we can probably merge these ideas: We like to group conditions into categories because it makes higher-level description of a population easier to grasp when you talk in terms of the different categories of disease in it vs the low-level clinical ideas. But, we also like to accurately identify the people in the population who have the disease in question, so we need phenotypes. So, if we can define phenotypes for as many clinical ideas, and then define a categorization scheme to group these clinical ideas into higher level categories, then a table 1 can be at the category level which identifies the population that has any of the clinical ideas (phenotypes) in the given category.

Debate #2: Build Table 1.

Patrick questioned whether this categorization script can be used to build a Table 1, or whether you better use phenotypes for that. Even though that was not the initial purpose of the script, it actually is a good question.

Let’s continue this debate in the following new Forum post, otherwise we will get totally confused.

Agree. Let me come up with some choices. We could cut up the 20, each into 5ish, or we go data driven and balance them a little better so each category is of similar size, or we use ICD10 or another vocab as an example.

That’s probably what we will end up with. Some generic categories, and some detailed phenotypes.

1 Like