OHDSI Home | Forums | Wiki | Github

What is a phenotype in the context of observational research?

But the books are algorithms with only a slight extension: book = phenotype algorithm

In our glossary, we distinguished between a phenotype and a phenotype algorithm (and a cohort). The “books” in the library are phenotype algorithms.

Phenotype is already an overloaded word with a specific meaning in genetics. I understand the desire to try and name something, but I don’t know that it needs a formal name. If it does need a name, there are better choices because “phenotype” is confusing in the context of acute events, costs, death, utilization, or medications.

In actual use, people will shorten “phenotype algorithm” to “phenotype”. For example, I believe the library will a “phenotype library” and important algorithms will be called “gold standard phenotypes”. I would suggest just using the word “algorithm” . Then one can talk about inclusion algorithms, diabetes algorithms, algorithm libraries, gold standard algorithms, etc.

I am focused on this issue because the algorithm is the building block of observational research using healthcare databases, and semantics are critical for communication. But unless someone has a specific question, I think I have spent too much time making my opinion known, and I won’t belabor the point anymore.

Algorithm is too broad a term, tho. You have algorithms for sorting lists, algorithms for calculating tax, algorithms for finding people in your cohort. You’re not actually conveying any information about the thing you identify as an ‘algorithm’ except to say that it takes some sort of inputs and yields a result.

I’ve been reading on how Phenotypes are described, and I found the same definition as you did:

the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment.

So, while I agree with you that the term ‘phenotype’ is rooted in genetics and what biologic characteristics are observed as a result of the genetics + environment influences, if we just go a bit higher level, phenotype involves the characteristics of the individual. That is at the core of when we talk about identifying cohorts: the people in the cohort fit a specific phenotype.

To me, algorithms are closer to implementation than conceptual. I could imagine multiple algorithms that would try to find people that fit a given phenotype. You could have the phenotype of ‘people who are diabetic’ but multiple algorithms to execute the selection.

Could you not imagine a phenotype of people undergoing gastric bypass surgery? That’s an acute event, but you could certainly think of a subset of a population (the cohort) who underwent the surgery? That’s a very acute event, but that surgery becomes part of the individual characteristic and therefore could be part of a phenotype.

I understand the source of confusion. I’ve seen the same confusion with term ‘cohort’ where people hear ‘cohort’ and automatically assume a particular study design and not a broader idea of ‘a subset of a population that exhibits a specific set of characteristics (or lack of a characteristic) for a period of time.’. However, the reason why I’m in favor of the term ‘phenotype’ is because phenotype does relate to the observable characteristics of an individual, so when someone says ‘phenotype’ I am thinking in terms of ‘people with specific characteristics’.

1 Like

Hello Everyone,

I was reading this post to understand what phenotypes are and how they are useful.

I was trying to understand the T2DM Phenotype algorithm and have few clinical questions related to that which are listed in this post.

So felt, linking my question here will help me get a better idea from you all.

thanks

In order to really understand phenotypes, it helps to have a basic understanding of medical diagnosis and treatment ; of health systems and how data is recorded ; and of data quality. For several of your questions, there are problems of multiple uses of a medication for treatment or a lab for diagnosis. For instance, metformin is a medication used for diabetes and polycystic ovarian syndrome and prediabetes. For others, there is a problem of data completeness. The diagnosis of diabetes may be missing from the ehr but the other data may still indicate diabetes. PheKB used this kind of pragmatic approach to maximize recall and precision given both these issues.
Tl;dr medicine is complex, the data is dirty, and phenotype algorithms have to take these issues into account.

t