What is a phenotype in the context of observational research?

jweave17 · May 8, 2019, 4:58pm

All:

@apotvien and the Gold Standard Phenotype Library Workgroup have made terrific progress creating the structure for a phenotype library, with functionality for viewing, managing, assessing, and comparing phenotypes, complete with version management/maintenance and even a method for visualizing phenotype networks. There’s some detail on these accomplishments and the path taken to get there on this thread. Further, the phenotype library was intentionally designed to be agnostic to the substantive content of a phenotype; the extensive functionality of the library does not depend on the content of a book (i.e. a phenotype). As you might expect, this begged the question: what exactly is a phenotype in the context of observational research? It became obvious quickly that there isn’t a common definition, so we thought it best to simply ask the community.

So, if I may, I’ll add to @schuemie’s outline and propose what I think comprises a phenotype to hopefully start a discussion:

A phenotype is the observable expression of a genotype given the environment. A phenotype definition as it pertains to research using observational data is the set of instructions that best predict which patients in a database are members of a cohort defined by a condition; in short, a disease classifier. The set of instructions for patient identification can be 1) at least one concept set expression and querying rules (heuristic) or 2) a predictive model (probabilistic). These instructions are data source agnostic. However, the quality or performance of a phenotype definition is data source dependent. A phenotype definition itself cannot be characterized or evaluated for performance. After a phenotype definition has been applied to a data source and individuals are identified as members of the resulting condition cohort, the phenotype can be evaluated (spec, sens, ppv, etc) in the context of the data source where that context includes demographics of the database population, observation time, feature availability, etc. In short, a complete phenotype entry in a phenotype library would include the phenotype definition (heuristic or probabilistic rules; database independent) and phenotype evaluation (characterization, evaluation; database dependent). Lastly, observational databases provide a subset (usually a small subset) of the full set of an individual’s observable characteristics, so phenotype performance results should be expected to reflect this. Phenotype information could be captured in 5 sections, which I’ll call 1) definitions, 2) characterizations, 3) evaluations, 4) metadata, and 5) dissemination. The use of plural is intentional in that a phenotype for a condition can include multiple definitions.

I expect there are problems with my proposal, but the purpose is mainly to stimulate discussion and move towards alignment on a common definition. So, what do you think comprises a phenotype?

mattspotnitz · May 8, 2019, 5:18pm

Hi James,

I would like to extend the biological analogy a little further. I believe that concept sets are the DNA, cohort builders are the gene expression/translation proteins, and the cohort population is the organism.
Concept sets consist of the codes that are relevant for analysis. The process of including or excluding concept sets is analogous to the way genes are activated or inactivated in a biological system. The product is expressed as a population.

Thanks,
Matt

jswerdel · May 9, 2019, 1:06pm

With that analogy then the population translated from the concept set(s) through an algorithm (the “translation proteins”) is the phenotype, the physical manifestation. That seems fair as that population will be the thing evaluated for sensitivity, PPV, etc. Now we need a way to structure the characteristics of the population by, say, time (“2010-2014”), demographic (“females”) bounding, etc., for searching purposes. I was wondering about what are the set of characteristics that fully describe a phenotype (population). Thoughts?

George_Argyriou · May 9, 2019, 1:51pm

Do you mean characterization or is this something different (e.g. metadata as you mention later)?

schuemie · May 9, 2019, 7:14pm

I’m not sure bringing genes into the discussion helps, and wish we didn’t use the term ‘phenotype’ at all.

What I think is important is that there’s a disease state ‘out there’ in reality, say ‘patient X has diabetes from April 1st onwards’, and then there’s some reflection of that in our data (e.g. a diagnose code of diabetes, presciptions of metformin), and we can use that information in a cohort definition to try and approximate that true disease state for our analyses.

But there is uncertainty about the actual disease state (when does diabetes begin?), and the definition of this state may vary (is it a certain blood glucose level? If so, which level? The inability to control blood glucose level sufficiently?).

Our cohort definition may capture that disease state imperfectly as mentioned. However, I think you can’t properly specify the performance of the cohort definition without being clear on the definition of the disease state. Is this currently being captured?

ericaVoss · May 9, 2019, 7:32pm

I’m with @schuemie, I’m not too keen on the genotype portion of the definition - but understand where @jweave17 is coming from because we are using the word phenotype. However I think there is already precedent of using the word phenotype to identify “electronic algorithms to identify characteristics of patients within health data” (PheKB).

Also, I want to throw into this mix the idea that we might use these definitions to also define cohorts pf patients that are exposed to drugs. Sometimes a definition for exposure is not as simple as looking for the ingredient. Some exposure the dose form is important (making the algorithm slightly more complex) and other exposures require existence of diseases, other exposures, and/or procedures prior to the exposure of interest.

People in the device space might additionally argue that we can’t limit these definitions to diseases and exposures.

Andrew · May 9, 2019, 8:38pm

Regarding

I agree that the term phenotype can confuse rather than clarify how we think and communicate about cohort definitions. Unfortunately there is too much functional similarity between this work and the strategies for linking biologic causes to their clinical expression as captured in health care data. It’s essentially the same practice being conducted for slightly different purposes. On the bright side, I think it’s only a matter of time before parts of those strategies like the ontologies (e.g. the Human Phenotype Ontology) and methods (e.g. semantic similarity analyses) come to OMOP and OHDSI and help us build better cohort definitions. @Juan_Banda’s great work is bringing us closer by the day. And once that day arrives, there will be no hope (and little reason) for using separate terms.

Regarding

I agree that’s needed and very important. I think there are at least four approaches available or currently being developed in OHDSI that can let us assess performance relative to a clear definition at some points in time for many, though not all, conditions.

Clear definitions of disease can be derived from sources in EHRs that are considered definitive such as pathology reports. At least in some cases these could be used as a gold standards to assess the performance of definitions restricted to Dx, Px, & Rx codes, etc when all are available on the same patients. Analogously, some lab values or imaging results are considered definitive for some conditions.
When tumor registry data are available on patients and fused with patients’ EHR data in a single OMOP instance, the tumor registry provides a clear definition of the disease and can function as a gold standard to assess performance of definitions that only use non-tumor registry sources.
Bringing clinical trial data into OMOP and fusing it with data on that same patients from an EHR/claims/registry data source will allow this same strategy for whichever diseases are definitively ascertained in the trial.
The capture of clinical expert’s judgments about case ascertainment by experts using Trey Schneider’s awesome Annotation tool can support the same gold standard performance assessment strategy.

In all 4 cases, I think we should develop standards for recording which definitions are considered definitive and why in the metadata schema and ontologies being developed in the Metadata WG.

These gold standards won’t always answer when a disease state begins. But they will allow performance to be assessed relative to a definitive assessment at a given point in time. I think that’s good enough. Doing much better than that might require near omniscience.

Though only some sites can do this work, comparison to a gold standard is of foundational importance in assessing performance. So, I think developing and exploiting these strategies should be a high priority. As we get better at assessing how performance in one data source is likely to extrapolate to another, sites that can’t do gold standard validation of cohort definitions in their own data will increasingly benefit from knowledge of performance at sites that can.

Mark_Danese · May 9, 2019, 10:32pm

I wasn’t going to jump into this thread to make the same comment, but since @schuemie made it, I want to fully agree with him that the use of the word “phenotype” or “computable phenotype” is really a poor choice of terminology. In our organization, we use “algorithm” because we are using logical statements (code sets, temporal conditions, provenance, etc.) to identify people with a specific characteristic. No one else has to use the word algorithm, but I would encourage others to use something other than phenotype.

Jill_Hardin · May 10, 2019, 1:04pm

These are all great points but what we as a group were hoping to gain consensus around is how we would capture the details of the algorithms to define the phenotypes (I agree with @Mark_Danese and @schuemie
that phenotype is an ambiguous term) so to that point here is a proposed structure we discussed on the Phenotype WG call and internally at Janssen to capture the details of the algorithms.

Definition
• ID
• Name
• Description
• ATLAS Cohort (inclusive of code lists and logic)
• Phenotype Definition Visualization

Metadata
• Owner
• Version
• Priority (Yes/No)
• Therapeutic Area
• Type (Indication, Outcome, Ingredient,Treatment, Generic)
• References
• CrossRef Phenotype / Provenance
• Subgroup (pediatric, adult, senior, N/A)
• Incident / Prevalent

Evaluation
• PheValuator Evaluation (or other to provide sensitivity, specificity, PPV)
• Stability over time
• DB Tested On / DB Recommended

Characterization
• Standard Features Generated

If anyone can think of a use case where this structure would not work please provide it here.

Thank you in advance for the input!

jswerdel · May 13, 2019, 7:24pm

Perhaps it would make sense to have 2 parts to the phenotype library:

gold-standard concept sets
gold-standard algorithms

The 2 parts would both be under a phenotype/health condition. For example, with ischemic stroke as the phenotype, there might be 2 concept sets, 1 for broad and 1 for narrow definitions, and maybe 2 algorithms, sensitive (say, 1 occurrence of a concept - possibly used for an outcome) and specific (2 occurrences outpatient within 2 days of each other or 1 occurrence from an in-patient setting -possibly used for an indication).

With this 2-part method, someone who needs to create a specialty algorithm could search for and find the correct concept set to use for their application. The same would apply in the case of needing a specialty concept set that could be plugged into a vetted algorithm.

Christian_Reich · May 14, 2019, 3:54pm

@jswerdel:

How do you define gold? When does it become silver? Can we even have those?

apotvien · May 14, 2019, 6:23pm

@Christian_Reich, we’ve been interpreting “gold standard” to mean “gold standard processes”. That is, a phenotype is gold standard if it was designed, documented, and validated with the best practices. This is inextricably linked to what’s being discussed here; how can a phenotype be documented properly without a consensus on what precisely constitutes a phenotype?

The point is that we certainly don’t wish “gold standard” to imply the presence of “phenotype police” and impose arbitrary cutoffs like, “It’s only gold standard if it demonstrates a sensitivity of at least 90%”. Generally, the utility and validity of a phenotype will vary from case to case, so it’s up to the end user to decide which metrics are acceptable. The role of the gold standard phenotype library is to maintain its contents (phenotypes and validation sets) in a way that is complete and systematic.

Christian_Reich · May 15, 2019, 5:10am

Makes sense. That is certainly a good way to define it, because it is independent on any subjective judgement of its actual quality. But that has a number of consequences:

Gold is inextricably linked to the dataset the standard got validated in. That is not transferable. Which in turn means, there cannot be a “Gold Standard algorithm”, or worse, a “Gold Standard concept set”. There can only be a “Gold Standard XYZ phenotype in 123 database”.
You will need to make this definition explicitly and abundantly clear. Because by “Gold” people inevitably understand “good” and “well vetted”. But a Gold Phenotype might be absolutely horrible.

Do you have also definitions for “Silver” and “Bronze”?

SCYou · May 15, 2019, 6:49am

We wanted ‘gold phenotype’ so badly, and I know we’re trying to grab the rainbow.
The goal of Gold Standard Phenotype Library Working group is establishing the right way for the journey to the gold phenotypes. The name of github repo is ‘PhenotypeLibrary’ without ‘gold’ (though it has ‘gold standard’ folder in it).

Besides, @Christian_Reich, you know that none of the ‘gold standards’ in medicine is perfect. For example, liver biopsy is a ‘gold standard’ to diagnose fatty liver. But it can only assess tiny little particle of the entire liver, which can result in a wrong diagnosis.
RCT is usually a ‘gold standard’ for establishing medical evidence, but we all know that it has flaws.

Again, you made a point. And this is exactly what Gold Standard Phenotype Library Working group is trying to do

apotvien · May 20, 2019, 2:07pm

Thanks to @jweave17 for starting out this discussion and to everyone else who has weighed in so far! From my perspective, what constitutes a phenotype definition is the most important outstanding question for the development of the Gold Standard Phenotype Library. The contents of every entry will depend on the template we set going forward. The sooner we agree on this, the sooner we can transition from synthetic data used for development purposes to real-world entries that can be shared with everyone.

I’d like to propose that we continue this discussion in earnest during our working group meeting, which will be tomorrow, 5/21 from 10-11am ET. Please find the meeting link below:

https://gatech.webex.com/gatech/j.php?MTID=mdd4af3e9b84212fc7df3eb0150703df5

While the working group meetings have always remained open to everyone throughout, I’d especially like to extend tomorrow’s invitation to anyone who wants to contribute to this rather pivotal moment in the library’s formation.

ericaVoss · May 21, 2019, 1:53am

I tried to summarize/leverage/steal what I learned from this post. My comments are purposely bias, so I look forward to discussion/disagreement on it.

What is a Phenotype?

Phenotype, as it pertains to observational research, is an agreed upon set of coded instructions that is the best approximation of finding members of a cohort within health data to use in analysis. The phenotype may define a study population (could correspond with an indication or target indication), a target intervention (could correspond with exposure cohorts consisting of the target medication, device, surgery, therapy, healthcare intervention, etc.), comparators (could comparator intervention/exposure cohorts, such as patients taking other medications within the class for a particular indication or set of indications, etc), or outcomes (could be safety/adverse events of interest, effectiveness, drug utilization measures [e.g., persistence, switching, adherence, etc], or other health outcomes of interest [e.g., ingrown toenails, elective lobotomies]). A phenotype’s purpose is to identify members in health data who have a characteristic of interest. The set of instructions could be rule-based (heuristic) or computable (probabilistic). These instructions are data source agnostic. However, the quality or performance of a phenotype definition is data source dependent. Phenotypes must be evaluated in terms of a data source.

Concept Set vs Phenotype vs Cohort

Concept Set = a list of codes to define characteristic of interest
Phenotype = Concept Set(s) + Algorithm
Cohort = Phenotype instantiated used in database

Gold Standard Phenotype Library

Mission Statement: To enable members of the OHDSI community to find, evaluate, and utilize community-validated cohort definitions for research and other activities.
“Gold Standard” phenotype or “Gold Standard Processes” to find a phenotype are defined as a phenotype that has been:
– Designed with best practices
– Evaluated with best practices
– Documented with best practices
Functionality for viewing, managing, assessing, and comparing phenotypes, complete with version management/maintenance and even a method for visualizing phenotype networks

Gold Standard Phenotype Library Book should include:

Definition
– ID
– Name
– Concise Description (which could be directly used in a paper) ATLAS Cohort (inclusive of code lists and logic)
– Rationale of the Definition
– Phenotype Definition Visualization
Metadata
– Owner
– Version
– Priority (Yes/No)
– Therapeutic Area
– Type (Indication, Outcome, Ingredient, Treatment, Generic)
– References
– CrossRef Phenotype / Provenance
– Subgroup (pediatric, adult, senior, N/A)
– Incident / Prevalent
– References to studies that have used the phenotype
Evaluation
– Operating Characteristics: PheValuator Evaluation (or other to provide sensitivity, specificity, PPV)
– Stability over time
– DB Tested On / DB Recommended
Characterization
– Standard Features Generated

I also acknowledge that the use the word “phenotype” seems to be a hot topic. But let’s not lose sight of the forest for the trees here. The true goal here is to have a way to create, manage, share, and evaluate these definitions. That helps the science not what we call it – we can find the appropriate term along the way.

Patrick_Ryan · May 21, 2019, 2:47am

Team:

This is a great thread, very happy to see this discussion taking place.

I’ll add my 2 cents for posterity sake:

How do I define ‘phenotype’? In the ‘Phenotype’/‘Cohort definition’ tutorial that I’ve offered a few times recently, I’ve used the description by @hripcsa and Dave Albers in their 2017 JAMIA paper “High-fidelity phenotyping: richness and freedom from bias”: “A phenotype is a specification of an observable, potentially changing state of an organism, as distinguished from the genotype, which is derived from an organism’s genetic makeup. The term phenotype can be applied to patient characteristics inferred from electronic health record (EHR) data. Researchers have been carrying out EHR phenotyping since the beginning of informatics, from both structured data and narrative data. The goal is to draw conclusions about a target concept based on raw EHR data, claims data, or other clinically relevant data. Phenotype algorithms – ie, algorithms that identify or characterize phenotypes – may be generated by domain exerts and knowledge engineers, including recent research in knowledge engineering or through diverse forms of machine learning…to generate novel representations of the data.”

I like this introduction for a few reasons: 1) it makes it clear that we are talking about something that’s observable in our observational data, 2) it includes the notion of time in the phenotype specification (since a state of a person can change), 3) it draws a distinction between the phenotype as the desired intent vs. the phenotype algorithm, which is the implementation of the desired intent.

In our tutorials, after I introduce the idea of ‘phenotype’ and ‘phenotype algorithms’, I introduce a new term, ‘cohort’, and here, we have a very explicit definition:

cohort = a set of persons who satisfy one or more inclusion criteria for a duration of time.

From there on in the tutorial, I try to be very precise in using this term, and reinforce this definition. We highlight how to create ‘cohort definitions’ as specifications for the criteria that persons must satisfy of time, we introduce how to design ‘cohort definitions’ using the OHDSI tool ATLAS, we demonstrate how ‘cohort definitions’ can be executed against OMOP CDM-compliant databases to identify the records which can populate the CDM’s COHORT table (which is defined by PERSON_ID, COHORT_DEFINITION_ID, COHORT_START_DATE, and COHORT_END_DATE). We highlight the consequences of subscribing to this definition of ‘cohort’: 1) one person may belong to multiple cohorts, 2) one person may belong to the same cohort at multiple different time periods, 3) one person may not belong to the same cohort multiple times during the same period of time, 4) one cohort may have zero or more members, 5) a codeset is NOT a cohort, because logic for how to use the codeset in inclusion criteria are required. And most importantly, we demonstrate how adoption of this definition of ‘cohort’ can enable the successful design and implementation of standardized analytics which rely on ‘cohorts’ as a foundational inputs to support clinical characterization, population-level effect estimation, and patient-level prediction. It’s important to note that, under this definition of ‘cohort’, a cohort can represent a disease phenotype (e.g. persons developing Type 2 diabetes), a drug exposure phenotype (e.g. persons initiating metformin for their Type 2 diabetes), a measurement phenotype (e.g. persons with hemoglobin A1c > 6.5%), or more generally, any combination of any criteria across any of the data domains that are observable (which basically means any tables in the OMOP CDM).

Now, in theory, if all data about a person was completely and accurately captured and ‘observable’, then it should be possible to take a ‘phenotype’ (the specification of the observable, changing state of the organism) and apply a ‘phenotype algorithm’ to the data to determine the spans of time for which that person satisfied the inclusion criteria to belong to the phenotype cohort. That is, with perfect data, there would be no difference between the desired intent and the materialization of the desired intent.

In practice, because data are imperfect, the ‘phenotype algorithm’ represents a proxy (whether it be a rule-based heuristic or probabilistic model) that attempts to represent the ‘phenotype’ given the available data. The phenotype cohort- the persons that satisfy the ‘phenotype algorithm’ criteria for durations of time- is an instantiation of that proxy. The differences between the true phenotype (which people actually belong to the observable health state of interest?) and the phenotype cohort (which people were identified as satisfying a set of criteria for a duration of time?) represent measurement error. There are multiple dimensions of error: 1) a person who truly belonged in the phenotype was not identified by the phenotype algorithm (false negative), 2) a person who did not belong to the phenotype was incorrectly identified by the phenotype algorithm (false positive), 3) the time at which a person entered a cohort may be misclassified (i.e. the cohort start date may not reflect the person’s true moment of entering the health state), and 4) the time at which a person exited a cohort may be misclassified (i.e. the cohort end date may not reflect the person’s true moment at which they no longer satisfied the criteria to belong to that health state).

For all of us in observational research, we need to accept that measurement error is a fact of life. All retrospective analyses of existing observational data must deal with how measurement error may influence the evidence being generated from the data. In a clinical characterization of disease natural history, measurement error may mean that prevalence or incidence of a condition is under- or over-reported. Measurement error in the cohort start date can mean misrepresentation of the time-to-event relationship between an exposure and outcome. In population-level effect estimation, misclassification in the target or comparator cohorts, or in the outcome of interest can bias our relative risks, and measurement error in baseline covariates can result in inadequate adjustment inducing confounding that can further bias the relative risk estimates. In patient-level prediction, measurement error in either the target or outcome cohorts can challenge the generalizability of the model from the proxies it was trained on to the ‘stated intent’ of the phenotypes of interest. If we had a proper understanding of the measurement error, we could incorporate it into our analyses to generate more reliable evidence that accounts for this added layer of uncertainty.

So, with all that said, I generally support the outline that @schuemie started, which others have added onto, and I like @apotvien’s framing that he has introduced in the Phenotype workgroup, so I’ll restate using the language framed above.

For each ‘phenotype’, we need to have a clear description of the stated intent (what is the observable, changing state of the organism that we are trying to represent?). I believe the stated intent likely has multiple components: 1) a clinical definition - not just a label, like a disease name (ex: ‘Type 2 diabetes’), but a complete specification of what the entity means, how it manifests and would become observable (ex. is T2DM identified because a clinician believes a person has Type 2 diabetes and records a diagnosis code in their EHR, or is it only confirmed on basis of HbA1c>6.5%?), 2) logical description - how will the clinical definition be applied to observational data? described in some human-readable form, with text and/or graphical depiction of logic, 3) intended use - how will the phenotype be applied to generate evidence? is it intended to represent an exposure of interest or an outcome or some baseline characteristic, to serve as input into a characterization, estimation or prediction study? is the phenotype intended to be applied in one specific context/database, or desired to be re-usable and transportable across a data network?

Each ‘phenotype’ could have one or more ‘phenotype algorithms’, as can be expressed as computer-executable code that. when implemented against a CDM-compliant database, instantiates a cohort representing the set of persons satisfying inclusion criteria for a duration of time. Ideally, the computer-executable code will be consistent with the human-readable logical description above.

For each observational database that a ‘phenotype algorithm’ is applied to, there is an opportunity to characterize the resulting cohort, and evaluate the performance of the ‘phenotype algorithm’ in representing the ‘phenotype’.

For cohort characterization, it would seem desirable to summarize the incidence and prevalence of the cohort within the database population, as well as detail baseline characteristics of the cohort to get some descriptive sense of the patient composition. It’d be nice to have a simple standardized analytic that would produce a ‘phenotype characterization’ that could produce the shareable set of aggregate summary statistics (no patient-level data) that could be uploaded into the phenotype library, under the phenotype entry and designated by the source data that it was contributed from.

For evaluation, it seems our real objective is to quantify the extent of measurement error. If we are talking about the misclassification of ‘false positives’ and ‘false negatives’, then the evaluation metrics that make most sense to summarize error are those directly computable from a confusion matrix: at a minimum, I would hope that we would aspire to estimate 1) sensitivity, 2) specificity, and 3) positive predictive value. Here, the trick is there is no one consensus ‘right’ way to estimate measurement error. Several approaches exist, including in this thread discussions of ‘chart adjudication’ and PheValuator. So, for our phenotype library, when a phenotype algorithm is applied to a particular data source, then we’d like to capture not just the estimates of measurement error (as represented by the operating characteristics sensitivity/specificity/positive predictive value) but also a description of the method used to estimate the measurement error. For chart adjudication, we’d like to know what sample of charts were adjudicated, which charts and how many, who were the adjudicators, what information was used for adjudication, etc. For PheValuator, we’d like to know what inputs were used in the process, including specification of the noisy positive and noisy negative labels used to train the probabilistic gold standard. The other form of measurement error is misclassification of the timing of cohort entry/exit, and similarly there, since there is no one agreed practice for evaluating this error, any estimate should be accompanied with a description of how the estimation was made.

To echo the prior sentiments from @apotvien and @SCYou, our OHDSI community ambition in building an open phenotype library should be establish and execute best practices for the design, evaluation, and dissemination of phenotypes. It is not realistic to expect that we will develop phenotype algorithms that are perfect in light of the ambiguity of medicine and the incompleteness and inaccuracies in healthcare data. But it does seem very reasonable to expect that we can apply consistent and transparent processes for phenotyping. Rather than waiting for a ‘best practice’ to be finalized and agreed to by everyone, I think we should move forward with a ‘better practice’ and see how far it takes us, recognizing that we’ll have to make some adjustments along the way. To use the analogy of a brick-and-mortar library, we are trying to construct a building that will hold books, while at the same time, trying to define what a book actually is (without having any books currently available to use as a reference). It’s hard to build the bookshelves if you don’t know how tall or wide or heavy a book can be and it’s even harder to create the Dewey decimal system card catalog without have a collection in place to organize. Our aspirational phenotype entry in our to-be-built library, one which has a complete clinical and logical description, a computable implementation that has been tested across the OHDSI network, with full characterization and comprehensive evaluation across a collection of databases, does not exist, not for one single phenotype. I think we should start drafting some books: sharing ‘phenotype’ descriptions and cohort definitions and whatever aspects of characterization and evaluation have been completed, with the explicit intention that by sharing what’s been done that we can rapidly iterate to improve these phenotypes and raise our collective confidence in their use to generate reliable evidence, but also so that we can start to build an open collection of whatever we’ve all previously developed as individuals into one shared community resource that we can all benefit from moving forward.

Christian_Reich · May 21, 2019, 7:12am

Friends:

@Patrick_Ryan provided a nice summary of what we can agree on. Here is a list of things we need to nail:

Do we use the word “phenotype” or not? If it is about @hripcsa’s “state of an organism” it’s fine. But a treatment “phenotype”? That’s not a state. It’s an intervention.
Are cohort definition and phenotype algorithm the same thing? Both of them expect a bunch of observables (criteria), and both have the notion of timing, or are “potentially changing” according to @hripcsa.
What is a Gold phenotype? One that’s fully characterized, or one that is the best we can have? What even is a good phenotype algorithm?
How do we distinguish phenotype algorithms that are perfect but in real life useless, because we rarely have the data, from those that are pragmatic? How do we characterize the feasibility of phenotype algorithms? Is that part of the definition, too? It has huge practical implications.

Judging from the above, we have the following nomenclature:

Phenotype = some abstract state,
Phenotype algorithm or cohort definition = actual combination of observables to detect the phenotype,
Cohort = actual patients for which the phenotype definition holds for a period of time.

Is that correct?

Patrick_Ryan · May 21, 2019, 11:44am

My direct responses:

My vote: We use the word ‘phenotype’, because it is already an established term commonly used in the field of informatics to represent the exact task we are trying to achieve. And a ‘phenotype’ can be a disease or an intervention, anything that is an observable, changing state of a person. The spans of time that a person is exposed to the intervention, for example when a person is persistently taking a drug, qualifies as a phenotype. And just as with any phenotype, an intervention phenotype should have a human-readable description, a computer-executable code to implement the logic, and characterization and evaluation summary statistics from any databases that have applied the cohort definition.

Yes, I think within the OHDSI community that can use the terms ‘cohort definition’ and ‘phenotype algorithm’ interchangeably, so long as any phenotype algorithm that someone creates satisfies our community definition of cohort, meaning that it generates a set of persons who satisfy one or more inclusion criteria for a duration of time.

I don’t think this is a necessary or even useful distinction to designate a phenotype as ‘gold’. We are trying to establish a best practice process for designing, characterizing, and evaluating phenotypes. That best practice process doesn’t necessarily mean that a particular phenotype algorithm generated will be provable to have perfect operating characteristics. I would consider a phenotype that is fully described, well characterized across one or more databases, and whose measurement error is evaluated to be considered ‘better practice’, even if the measurement error is shown to be large (e.g. either the sensitivity or positive predictive value is low or the specificity is high). But, conversely, I think we want to distinguish between our intent to develop a open community phenotype library vs. using ATLAS to create cohort definitions. An ATLAS cohort definition by itself doesn’t have a clinical or logical description, doesn’t have a characterization, and doesn’t have an evaluation, so is missing many of the elements we consider part of the ‘best practice’ phenotyping process. I expect we are aspiring to do more than just share cohort definitions, because the meta-data around the cohort definition is necessary to provide the context for the phenotype’s appropriate use when generating real-world evidence.

When collaborators across the OHDSI network implement a phenotype and share back the characterization summary statistics and provide whatever evaluation that they conduct within their own institution, then I think the phenotype library will give us a tremendously valuable resource to determine which phenotype algorithms are feasible and pragmatically useful. But this will require active participation across our community to openly share whats been learned, so that everyone can benefit from past experiences.

Yes, 1) Phenotype, 2) Phenotype algorithm = cohort definition, 3) Cohort are three different ideas defined here, and those are the terms I will try to use to delineate them.

apotvien · May 21, 2019, 1:55pm

Thank you @EricaVoss and @Patrick_Ryan for these beautifully detailed descriptions. There’s a wealth of information here. A while ago, the WG established a Submission Template intended to capture the elements that authors and validators would need to supply in order to make an entry to the library. I’ve taken a start in getting the thoughts from that document merged with the additional details from your posts, in tabular form:

I’ll be starting the WG meeting soon. I hope the group can help to refine this more during the meeting to get closer to a final draft for an author and validator submission.