Phenotype Phebruary Day 4- Multiple Myeloma

Friends:

I am watching this curiously mostly from the distance (shame on me), and this is not to diminish the value of a systematic effort driven by an enormous will power, as it is hard and arduous. But I have an issue. Here it is:

Why do we need phenotypes to identify patients with a condition? Drug is not a problem, we don’t have to double guess, and neither is procedure, visit, measurement or device. Any of these facts are either captured or not, and if they are we are pretty sure we got them. Why do we need to do all these gymnastics to chase conditions so badly?

@agolozar alluded to it. There is a conspiracy against us. Diagnoses are:

  • Captured at a rate anywhere between 0 and 100%. Itch on the head is an example of the former, anything with severe impact on life like a myocardial infarction is typical for the latter. The problem is that effect is not only intrinsic to the type of diagnosis, it also has a huge variability between data sources.
  • Changing and sometimes never final. An myeloma patient will have a diagnosis of a syncope in the emergency call report, an anemia in the ambulance, a monoclonal gammopathy in the ER and a multiple myeloma in the hematology/oncology department. It’s still one and the same disease.
  • Hard to make, meaning, even the physician does not know. Myeloma can mimic other diseases and linger around for a while, till the telltale signs become apparent. Some diagnoses are never made, a problem typical for rare diseases.
  • Imprecise and jargony. For example, “multiple myeloma” could mean an acute onset or the long term disease. “Allergy” can mean the actual allergic response or the sensitivity to some allergen.

So, what do we do? We create these complicated heuristics using a bunch of tricks:

  • Repetition of diagnoses over time,
  • Addition of circumstantial evidence (procedure of bone marrow aspiration before diagnosis)
  • Elimination of alternatives and implausibilities as per @aostropolets,
  • Re-diagnosis from symptoms, lab tests, path lab or imaging results,
  • Combination of these different methods using Boolean logic.

Nothing wrong with that, except their purpose is not transparent and each of them can have side effects, which makes it very hard to debug them. We, and of course everybody else, create these “best practice” definitions, but without asserting which problem with the diagnoses we think we are addressing using what mechanism. In addition, the definitions become heavily interdependent with the data source they are applied to, and in those cases where we admit that relationship we use imprecise categories (“this definition works in claims data”). Finally we have the use case problem, where definitions seemingly differ if they are employed in exposure or outcome cohorts.

And on top of all that we mix the heuristics with the the true inclusion criteria belonging to the study design (“age>=18”).

This is a black box situation we ought to avoid at OHDSI. Not sure what that would look like, but the ideas that come to mind are:

  • The definition of the use case
  • A heuristics synopsis explaining the problems and solutions applied
  • An annotation of each criterion explaining its purpose

We could also start using @aostropolets’ concept prevalences to make criteria dependent on the capture rate in a specific database compared to the overall availability. But that would be a bigger undertaking.

Thoughts?

2 Likes