OHDSI Home | Forums | Wiki | Github

What is a phenotype in the context of observational research?


Since I was the one to open the debate about these definitions, let me propose a synthesis of all that was said to make @apotvien’s life easier. I think we have a good grasp of the elements we want to define, but we have still nomenclature problems with the term Cohort:

  1. Phenotype: A pattern of characteristics in health data (criteria) in a set of people for a duration of time. These observables can be conditions, procedures, drug exposures, devices, observations, visits, cost information, etc.

I think that “pattern” is better than “set”, because it indicates a relationship between the observables or critera (insulin-dependent diabetic: Patient with the Condition diabetes mellitus and being treated with a drug containing insulin).

  1. Phenotype Algorithm = Cohort Definition: A coded set of instructions for approximating a phenotype in a given dataset, which may or may not have complete and accurate evidence about each of the observables and their pattern. Each phenotype can have one or more phenotype algorithms (e.g. T2DM broad, T2DM narrow). The instructions could be heuristic (rule-based) or probabilistic. Heuristic algorithms consist of rules applied to concept sets. Probabilistic phenotypes are implemented using a probabilistic model.**

This is similar to @apotvien|s definition, except there is no more desire involved (desires could be a good thing, but not in the context of these abstract definitions), and that the algorithm doesn’t define members, but rules. And that heuristic rules are also computable, so I took that out. And that the model is probabilistic, not predictive.

Now we need to name the actual instantiated set of members identified through execution of the algorithm. We can (i) call that Cohort, or (ii) we can make Cohort a synonym for Phenotype and call this Cohort Instance. The former means Cohort is the ideal desired pattern of things (insulin-dependent diabetics), the latter denotes an actual set of people and the timelines a certain algorithm or definition has calculated in a database (cohort 123 in database XYZ).

(i) I actually like the idea to use the terms interchangeably. Reason is the avoidance of confusion. Folks who have a hard time calling a drug or device exposure a phenotype can call that a cohort and be happy. Folks who have a hard time calling an outcome a cohort, which is a lot of our traditional epidemiologist friends, can call that a phenotype and also be happy. If we want to be really nice we might even include Rothman’s Population as well. I don’t have a strong feeling about that.

(ii) This how we have used the word Cohort mostly, ATLAS calls it that way (even though the nomenclature in the ATLAS UI badly needs overhauling), and @apotvien et al. proposed it.

Anyway. Whatever we decide:

Cohort/Phenotype Instance (i) or Cohort (ii): An instantiation or execution of the instructions of a Phenotype Algorithm/Cohort Definition against a dataset, resulting in a set of patients and their timelines.

I agree with @Patrick_Ryan that Concept is not a term we want here. Concepts are semantic entities representing medical events or facts, and they are needed for those algorithms.

Now, we still have the precious metals. @japotvien has a Gold Standard Phenotype as “one that is designed, evaluated, and documented with best practices.” What is the “one” thing here? What does it apply to: A Phenotype, as @apotvien has it? Can’t be, because that is an intended ideal we need to approximate, which means, all of them are Gold. A Phenotype Algorithm? Can’t be, because the evaluation and documentation depends on an instantiation. A Cohort (Instance)? That would be the right thing, except it makes it totally not transferrable, and therefore practically useless.

Also, we want Gold. Do we also want to take on Silver? Something that is not fully validated against some truth (the “chart”), but only probabilistically? Bronze - something we pull out of a sleeve after chewing the pencil and scratching our foreheads for a while (which is what 99.9% of what all published phenotypes are today)?

Please help.

1 Like

Thank you. All attempts to make my life easier are welcomed. :wink:

By “Gold” here, I mean phenotype algorithm, because that’s what every entry in the library will be.

Let me try to frame this with a cooking analogy.

Suppose we wish to make chicken noodle soup. When we imagine what that looks like, we’re all thinking about roughly the same thing. However, when it comes down to the details about how to make a chicken noodle soup, there are a plethora of recipes out there. Even if two people follow an identical recipe, they may get different results. Before moving on, the notion of chicken noodle soup is the phenotype, a particular chicken noodle soup recipe is a phenotype algorithm, and an actual pot of hot chicken noodle soup sitting on the kitchen table (mmm…) is the cohort (an instance of an applied recipe).

Now, there are two ways such an applied recipe can fail: 1) The recipe itself is inherently bad; maybe it leaves out the noodles and calls for the chicken to remain raw, and 2) The recipe was not followed by the cook; it calls for ingredients not in the cook’s pantry so the cook left those things out, and it turned out poorly.

Turning back to our library, I think this highlights the importance that the validation relies on both the authors and the validators alike. The author needs to be given the opporunity to lay out all of the pieces (ingredients) required to successfully implement their proposed phenotype algorithm. Likewise, a validator is obligated to report metrics only if they followed the author’s stated instructions and intended use.

If that contract is met, we should be able to automatically discern high quality phenotypes over time as they are validated, much like seeing a recipe with multiple 5-star reviews. The notion of “Gold Standard” refers to the idea that the phenotype algorithm went through an agreed upon process to be admitted into the library, but it doesn’t pass judgements about the performance characteristics. The notion of what’s acceptable will vary from case to case and person to person – It’s subjective, just like who we believe has the very best chicken noodle soup recipe. :slight_smile:

This has been a rich and fascinating discussion. I recognize that this will be archived here, but am wondering if it could be synthesized and summarized, perhaps in the form of an article. While OHDSI will develop its own definition, the thinking can inform other groups who are undergoing similar processes, and can also serve as instructional material, helping to educate all of us on the many aspects of phenotypes (or whatever term is used).


Hi all,

I wanted to share a nice paper on the topic to add to the discussion - https://rethinkingclinicaltrials.org/resources/ehr-phenotyping/. Aligns with your discussion. Not suggesting it over any of your current definitions where different - I’m a new fly on the wall - just sharing.

Ray (Epidemiologis/Informaticist at CDC)

1 Like

I think its important to note that:
EHR ‘Phenotypes’ are at best an approximation of the ‘True Phenotype of Individual’
‘Phenotyping algorithms’ are methods that allow for the approximation of an EHR phenotype

Therefore in terms of quality/accurateness:
‘Phenotype as defined by an EHR Phenotyping Algorithm’ < ‘EHR Phenotype’ < ‘True Phenotype’

The reason I make this distinction is because most EHR phenotyping algorithms have some level of accuracy - say 95%. This accuracy is typically determined by comparing the phenotypes generated from the algorithm with data also in the EHR (be it notes, structured data, etc.). Very rarely do we recruit patients and then determine their ‘true phenotype’ and compare against the ‘EHR phenotype’ and then the ‘phenotype from the EHR phenotyping algorithm’. This is an important point because often insurance status affects whether certain tests are performed (i.e., the EHR data that informs our algorithms) which thereby affect the EHR phenotype generated. If a patient has never been tested for a disease they will likely not have the EHR data for that disease - although in truth they may have the disease.
I discuss some of these issues in a 2013 JAMIA paper:

At best all we can hope for is high quality ‘EHR Phenotype’ information - we cannot capture ‘True Phenotypes’. This addresses some of peoples concern over the ‘gold standard’ terminology. I would say that accurate ‘EHR phenotype’ information is a gold standard while the ‘true phenotype’ is a platinum standard - amazing if you can get it, but very hard to acquire and also very rare.
Therefore, algorithms that approximate EHR phenotypes are ‘silver standards’, which is consistent with how the term ‘silver standard’ is used in the field as well.

To make my definitions a little more organized:
Platinum Standard: The True Phenotype of the person
Gold Standard: True EHR Phenotype (this should be non-institution dependent - therefore it should not be based on how your specific institution has coded diabetes, but that diabetes was coded in the EHR)
Silver Standard: Phenotype inferred from Phenotyping Algorithm Applied to EHR data (could vary by institution). The gold standard should be based on definitions that increase the accuracy of the silver standards across institutions

Some examples:
1.) Homeless person with diabetes, goes to hospital because injured in an accident. No one tests for diabetes - no EHR data on diabetes.
Platinum standard: diabetes (impossible to capture by algorithms b/c data on diabetes does not exist in EHR)
Gold standard: no diabetes
Silver standard: no diabetes

2.) Person with diabetes, goes to hospital and is coded with one diabetes code, they then lose insurance and stop being treated for diabetes at that institution
Platinum standard: diabetes (could be possible to capture if you lower the threshold to include 1 diabetes code presence in EHR, but this will also increase the false positive rate)
Gold standard: no diabetes (will only be listed as diabetes if the definition is expanded to include patients with 1 diabetes code)
Silver standard: diabetes (if you define at your particular institution to include all patients with any diabetes code) - this could have high accuracy at your institution, but unlikely to generalize across institutions. The generalizable phenotype definitions should be considered ‘gold’

We had an internal meeting where we discussed this with our broader department. We landed almost identical to what @Christian_Reich wrote above, so I’ll just merge the two.

A phenotype, as it pertains to observational research, is a pattern of observable characteristics in health data for a set of people for a duration of time. These characteristics can include conditions, procedures, drug exposures, devices, observations, visits, cost information, etc.

Phenotype Algorithm = Cohort Definition
A phenotype algorithm is a coded set of instructions with the desired intent of identifying members of a phenotype in health data. Each phenotype could have one or more phenotype algorithms (e.g. T2DM broad, T2DM narrow). The instructions could be heuristic (rule-based) or probabilistic. A heuristic based phenotype algorithm consists of rules and one or more concepts sets. A probabilistic phenotype algorithm is implemented using a probabilistic model.

Cohort or Phenotype Instance
A cohort instance or phenotype instance is a set of patients for a duration of time which result from the execution of phenotype algorithm instructions against health data.

Again I welcome people to challenge these and provide feedback.

Many thanks to @Frank for help with wordsmithing. :pencil2:


@apotvien I think we should at least captured our definitions. Maybe on the WIKI for the Phenotype Library. :wink:


Just trying to understand how to use this proposed language.

Let’s say I want to conduct a study of cardiovascular outcomes in new statin patients. I want to use first statin use to define the index date, I want to include hypertension as an inclusion criterion, I want to use congestive heart failure as an exclusion criterion, and I want a number of baseline exposures for my multivariate model (age, sex, hyperlipidemia diagnosis, family history of cardiovascular disease, LDL cholesterol, and number of hospitalizations in the last year). My outcome is death, myocardial infarction or stroke. (Not a complete study of course.)

In my world, each of those pieces (study variables) are operationalized with an algorithm. The cohort is the group of people who meet the index, inclusion, and exclusion criteria. The follow up time is time until end of observation, death, myocardial infarction, or stroke. I might have different follow up times for the cohort if I want to look at overall mortality only (i.e., ignoring the cardiovascular event outcomes).

Is the phenotype the intersection of all of the inclusion and exclusion criteria? Or are each of those criteria independent phenotypes? Or is this the cohort?

Same question with baseline variables and outcomes – are these separate phenotypes? Or is the phenotype the sum of all of the pieces of my study (and most people would have slightly different phenotypes)?

1 Like

I’ll give it a shot to translate those pieces into OHDSI:

T (target cohort): new statin patients.
O (Outcome cohort): 1 or more cohorts identified by the phenotype algorithm that represents the cardiovascular outcome of interest.

include hypertension = A least 1 occurrence of diagnosis of {hypertension concept set} (note: you didn’t include any time window information here. Recent diagnosis, any prior diagnosis? within 1 year?)
exclude congestive heart failure: Exactly 0 occurrence of diagnosis of {CHF concept set} (note: no time window specified, when should these CHF events appear that would exclude? 10 years ago means they are excluded? only within a year?)

I’m not sure I follow the terminology of ‘exposures’ here, but it sounds like you are describing ‘features’ of the population. If you are saying that you require a certain age, sex, family history, etc for the person to qualify for the cohort, then it’s inclusion criteria. If it is a ‘feature’ of the population after you’ve identified your new statiin users, then it’s a not part of the cohort definition. It’s a covariate that you can include in your model (which are extracted via Feature Extraction or some other mechanism).

Those are three different cohorts, an Outcome (O) cohort per outcome of interest, which you will have a phenotype algorithm to identify each.

Cohort definitions are not concerned with the followup time that is specific to your study, or which outcomes of interest may end this followup time. When you define your new statin user cohort, the question is: how long are they present in the cohort? Are they new users just for that day? Do you let them qualify as a ‘new user’ for a fixed duration after their initial exposure? Do you consider the first stretch of continuous exposure to the statin the period of time that they are new? These are all decisions that go into your cohort definition to define when the person enters the cohort and when the person should leave. The simplest definition would say the person enters the cohort at the first statin exposure, and they are considered 'new users’for all time after the first exposure.

On the other hand, when you are defining your study, you establish what you what to use as your follow up time (or what we sometimes call Time At Risk). This time window can be based off of the person’s cohort_start, cohort_end, or something in-between. Depends on what you want to do. Did you want to study the risk of an outcome during the 6 months after a patient ends their exposure to statin? Start with a chort where the start-end represents exposure, and then your follow-up/time-at-risk is from the cohort end date to 183 days afer cohort end date.

I don’t think each of the elements of the phenotype are themselves a phenotype. A concept set is not a phenotype, and a single criterion may not be enough either (eg: cohort of men? What’s the start/end of that statement?)

The phenotype is an intersection of inclusion and exclusion rules, but more than that, it is also the specification of how long the people should be considered part of the phenotype. This is not follow-up. Follow up is study-specific that you decide when designing a study (are we looking at short term risks or long term risks). You could consider the cohort_start to cohort_end as the time at risk period, but that’s just another decision you make when designing your study.

I think you could use a phenotype to identify a baseline variable (ie: they were in a phenotype within 30d of baseline/index). But not all baseline variables would be a phenotype. The number of Inpatient visits in the past 6 months is a basline variable. It is not a phenotype.


the problem with Christian’s definition of phenotype

as “A pattern of characteristics in health data (criteria)” is that fundamentally you have inclusion and exclusion criteria. The algorithm includes or excludes from the phenotype. I think the issues others are having on this thread is that we all fundamentally know in our gut that inclusion and exclusion criteria does not make sense in the context of a phenotype in the sense of what is ‘diabetes’. Its possible that you will fail the inclusion criteria and yet still have the phenotype. If by ‘phenotype’ you mean a set of exclusion or inclusion criteria then that would be a cohort but not a phenotype. The words are not really interchangeable.
The definition of cohort (from wikipedia) is:
“In statistics, marketing and demography, a cohort is a group of subjects who share a defining characteristic.”

Therefore ‘death’ is a characteristic - a group of people who died are a cohort. Death is not really a phenotype in the truest sense of the word. Death within a certain timeframe is an important outcome or characteristic of interest. An outcome is often the result of a Phenotype + Exposure, but can also be the result of the disease or Phenotype itself. Its important not to confuse these terms.
It might simplify things if there was an outcome, disease and an exposure library where all of the cohorts were defined. It will be easier for people to understand what everything is. Otherwise, if you call everything a ‘phenotype’ it will be really confusing.

I think there is just an ambiguity on the term phenotype and we can decide how to deal with it. Richesson, who is linked to above, highlights the ambiguity in first defining “phenotype” and then defining “EHR-based phenotype definitions, or simply phenotypes.” That JAMA paper is also a good example. There is no concept (no chicken soup) underlying a dimension reduction on observable variables to classify a disease into 4 subtypes with slightly different outcomes. There are an infinite number of ways to divide a disease such that patients have slightly different outcomes within groups. There may be a hope that there is an underlying biological truth to it, but not clear. It doesn’t seem that much better than an arbitrary classification based on EHR data.

For those classification phenotypes (where the word “phenotype” was approved by JAMA editors) failure of the of the inclusion criteria does in fact exclude you from the phenotype because there is no underlying truth to the classification to compare to. Either you fit the criteria or not.

I am good with the @ericaVoss definition but I do agree with others that while phenotypes change over time, there is no need to define a phenotype as having an implicit duration (“for a duration of time”). Instead I think it is important to emphasize that the characteristics are measured in time. And you have to be explicit whether you want simultaneous characteristics or not. That is, we want to push people to account for time in their queries, but not assign a specific duration of time that a phenotype has to last.


Ultimately I think it depends on how clear you want the word ‘phenotype’ to be within the OHDSI framework. Obviously its up to you guys and you can call exposures, outcomes and diseases all phenotypes if you like - in a certain sense they each would fall within your definition of phenotype. Because the characteristics of a person that decides to take a particular medication could be a ‘phenotype’ - regardless of disease or outcome. However, if everything is a ‘phenotype’ it muddies the waters and makes it less clear what the definition means. I think it would make more sense to keep labels such as exposures, outcome, and disease - it would be a lot more clear to people what is being talked about. If everything is called a phenotype it will be more confusing - again up to you.
Some phenotypes have implicit durations (e.g., pregnancy) and others are permanent (e.g., death) - but you could leave it up to the cohort builders to represent and model their exposures, outcomes and diseases appropriately - this would increase flexibility

1 Like

I agree with @Mary_Regina_Boland. We have a lot of good words for everything already. Cohort, index date, inclusion, exclusion, baseline exposure/variable, outcome.

I think part of the struggle is what word to use to describe the implementation. When I say “diabetes is an inclusion criterion” there isn’t ambiguity at a conceptual level about what I am doing in my protocol. But the details of my implementation of “diabetes” are not clear. The implementation can be called an algorithm, definition, criterion, phenotype, computable phenotype, etc. I don’t love “phenotype” or “computable phenotype” to describe the implementation, but that is just my opinion.

I think using “phenotype” to describe a study cohort (the result of all inclusion and exclusion criteria) is even more confusing. Especially since we already have the word “cohort”.

My issue seems to be that the OHDSI term “cohort” is used slightly differently in the context of a protocol than I use it as part of my research. So, I will not belabor the point except to say that it might help to clarify this for others. (Thanks to @Chris_Knoll for going into so much detail on the implementation of a study using the OHDSI framework – that helped me understand that I was essentially using a different language.)

1 Like

When you put books in a library do you categorize them for leisure reading versus assigned school reading? Is a library not a place where the book exists and you, as a consumer, add the meaning of how the book is used?

A book is a book is a book. No?

We’re assembling a collection of books. Lest we forget books are not capable of reading themselves. Whether I read Homer’s Odyssey for summer fun or because I was mandated to do it for a course is, by all accounts, creating a level of intricacy that isn’t a library’s role.

That isn’t quite the analogy being made. Imagine we have a very good word like “book”. Then somebody else advocates that we use the word “file” instead. Because, after all, all books nowadays are created, stored, and accessed on computers as some kind of file. Hence, “file” is a better word to use.

In my opinion, book = algorithm, and phenotype = file. Phenotype isn’t “wrong”. But it has a very specific meaning in the context of our field of health and medicine – the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment. Why use that word?

But the books are algorithms with only a slight extension: book = phenotype algorithm

In our glossary, we distinguished between a phenotype and a phenotype algorithm (and a cohort). The “books” in the library are phenotype algorithms.

Phenotype is already an overloaded word with a specific meaning in genetics. I understand the desire to try and name something, but I don’t know that it needs a formal name. If it does need a name, there are better choices because “phenotype” is confusing in the context of acute events, costs, death, utilization, or medications.

In actual use, people will shorten “phenotype algorithm” to “phenotype”. For example, I believe the library will a “phenotype library” and important algorithms will be called “gold standard phenotypes”. I would suggest just using the word “algorithm” . Then one can talk about inclusion algorithms, diabetes algorithms, algorithm libraries, gold standard algorithms, etc.

I am focused on this issue because the algorithm is the building block of observational research using healthcare databases, and semantics are critical for communication. But unless someone has a specific question, I think I have spent too much time making my opinion known, and I won’t belabor the point anymore.

Algorithm is too broad a term, tho. You have algorithms for sorting lists, algorithms for calculating tax, algorithms for finding people in your cohort. You’re not actually conveying any information about the thing you identify as an ‘algorithm’ except to say that it takes some sort of inputs and yields a result.

I’ve been reading on how Phenotypes are described, and I found the same definition as you did:

the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment.

So, while I agree with you that the term ‘phenotype’ is rooted in genetics and what biologic characteristics are observed as a result of the genetics + environment influences, if we just go a bit higher level, phenotype involves the characteristics of the individual. That is at the core of when we talk about identifying cohorts: the people in the cohort fit a specific phenotype.

To me, algorithms are closer to implementation than conceptual. I could imagine multiple algorithms that would try to find people that fit a given phenotype. You could have the phenotype of ‘people who are diabetic’ but multiple algorithms to execute the selection.

Could you not imagine a phenotype of people undergoing gastric bypass surgery? That’s an acute event, but you could certainly think of a subset of a population (the cohort) who underwent the surgery? That’s a very acute event, but that surgery becomes part of the individual characteristic and therefore could be part of a phenotype.

I understand the source of confusion. I’ve seen the same confusion with term ‘cohort’ where people hear ‘cohort’ and automatically assume a particular study design and not a broader idea of ‘a subset of a population that exhibits a specific set of characteristics (or lack of a characteristic) for a period of time.’. However, the reason why I’m in favor of the term ‘phenotype’ is because phenotype does relate to the observable characteristics of an individual, so when someone says ‘phenotype’ I am thinking in terms of ‘people with specific characteristics’.

1 Like

Hello Everyone,

I was reading this post to understand what phenotypes are and how they are useful.

I was trying to understand the T2DM Phenotype algorithm and have few clinical questions related to that which are listed in this post.

So felt, linking my question here will help me get a better idea from you all.


In order to really understand phenotypes, it helps to have a basic understanding of medical diagnosis and treatment ; of health systems and how data is recorded ; and of data quality. For several of your questions, there are problems of multiple uses of a medication for treatment or a lab for diagnosis. For instance, metformin is a medication used for diabetes and polycystic ovarian syndrome and prediabetes. For others, there is a problem of data completeness. The diagnosis of diabetes may be missing from the ehr but the other data may still indicate diabetes. PheKB used this kind of pragmatic approach to maximize recall and precision given both these issues.
Tl;dr medicine is complex, the data is dirty, and phenotype algorithms have to take these issues into account.