While exploring the use of LLMs to help create and curate concept sets, we stumbled on an interesting fundamental question: what logic should we (or AI) apply when deciding which concepts belong to a concept set?
Specifically, we have two alternatives:
- Only include concepts that semantically fall within the main concept set idea. For example, for the concept set on “Nausea” we would include “Postoperative nausea” or “Exacerbation of nausea”. Often these are subtypes of the concept set idea.
- Also include concepts that practically imply the concept set idea. For example, for “Nausea” we would also include “vomiting”, as in nearly all cases, vomiting implies nausea.
Option 2 does not necessarily mean lower specificity, as we require (almost) everyone who has the concept must also have the concept set idea (we currently instruct the LLM to use a threshold of 95% of people with the concept must have the idea). But it does have the potential to raise sensitivity. Option 2 would make most sense if the goal is to maximize the phenotype operating characteristics, but it does feel odd to have concepts in the set that are not subtypes of the main idea.
Below are some more examples. What do you think should be our policy for making concept sets?
Primary concept | Possible included concept | Rationale |
---|---|---|
Bleeding | Open fracture of ulna | Bleeding is a direct and expected consequence of an open fracture of the ulna due to the disruption of blood vessels and exposure of the bone. It is nearly guaranteed to occur in such cases. |
Bronchitis | Haemophilus influenzae pneumonia | Haemophilus influenzae pneumonia logically implies bronchitis because the infection directly causes inflammation in the bronchial tubes, which is the defining feature of bronchitis. Therefore, nearly all patients with Haemophilus influenzae pneumonia would also meet the criteria for bronchitis. |
Diarrhea | Gastroenteritis | Gastroenteritis is a condition that almost universally includes diarrhea as a symptom. In medical practice, the presence of gastroenteritis strongly implies the presence of diarrhea, as it is one of the defining features of the condition. Therefore, it is logical to conclude that 95% or more of patients with gastroenteritis have diarrhea. |
Fatigue | Generalized myasthenia | Fatigue is a nearly universal symptom of Generalized myasthenia due to the disease’s underlying mechanism of impaired neuromuscular transmission. It is logical to conclude that more than 95% of patients with Generalized myasthenia experience fatigue. |
Vertigo | Active Ménière’s disease | Vertigo is a nearly universal symptom of Active Ménière’s disease, as it is one of the defining features of the condition. The presence of vertigo is logically implied by the diagnosis of Active Ménière’s disease. |