OHDSI Home | Forums | Wiki | Github

Brainstorming on how to build the OHDSI Phenotype Library as a community resource

We in OHDSI always had a dream - build a community resource of cohort definitions that are collaboratively developed and evaluated, and made available for everyone for reuse. This dream is to core to the OHDSI mission

Our Mission

To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.

We tried several times to build an OHDSI Phenotype library, but have failed. Here is one more attempt - and hopefully a successful one.


  1. Create a managed community resource that everyone in OHDSI Community has access to, are familiar with and can use.
  2. The resource allows for collaborative creation of cohort definitions. e.g. if @Ajit_Londhe has a cohort definition and would like to contribute, he should be able to.
  3. The resource allows for searching and reviewing the cohort definition logic i.e. you must be able to easily find and look at what the cohort definition is.
  4. The resource allows us to learn about the cohort definition by executing it on a network of data sources and returning a pre-computed deidentified population level result set in an interactive manner,
  5. The resource allows for interpreting these output and anyone to provide a scientific review of their interpretation i.e. if @Patrick_Ryan does a long rant about the cohort definition - that is awesome. But if @Christian_Reich disagrees - that is also known. Infact, the resource should allow for a scientific discourse at the cohort definition level along with allowing a peer review voting process! Cohort definitions that are up on the voted list, get “superior” labels - as they are considered vetted by the community.
  6. Resource is also part of the OHDSI technology (e.g. HADES) and allow for example @schuemie to import the cohort definitions when creating a new OHDSI study package.

The OHDSI Phenotype Development and Evaluation workgroup is tasked to get the OHDSI Phenotype Library - but before describing the solution - are there any other requirements?

1 Like

Some, no doubt debatable, possibilities:

Opinionated. Rather than containing every suggested definition for a given idea, the phenotype library could present a single preferred definition. Many alternative definitions could be considered before choosing the “preferred” definition that would be presented in the library, but in the end there would be some process by which a single definition is chosen to represent the current OHDSI standard definition for a network study. Of course, over time the preferred definitions could change (ideally with changes recorded with version control).

Composable units. For example, rather than providing a phenotype for COVID-19, it could be preferable to have distinct phenotypes for diagnosed with COVID-19, PCR test positive for SARS-CoV-2, and so on. Researchers can then combine definitions from the phenotype library depending on their question at hand (while making sure to run cohort diagnostics on their combined cohort definition!). Not only helping to provide flexibility for researchers, this could also make it easier to chose a preferred definition.

Specified use case. In general, I would think that cohort definitions would vary depending on their intended use. For example, an appropriate definition of something like deep vein thrombosis (DVT) might differ depending on whether DVT is a study outcome or if it is being identified to describe the profile of a patient at cohort entry (e.g. for the latter the inclusion of “history of…” codes might be appropriate unlike for the former). Perhaps separating out cohorts for their different purposes in a study (e.g. 1) History of, 2) Exposure of interest, 3) Outcome of interest) would be helpful. I know this probably sounds like I´m contradicting my first point, but I would still argue that within each of these categories there could be a preferred definition.

Structured documentation. Ideally there would be a description of how a cohort definition came to be. If possible, providing this in a structured format would likely be helpful. This could include how a candidate codelist was generated, why certain codes were excluded, and so on.


Agree - but i was thinking we could use the ‘tags’ system that was recently introduced to Atlas as the source of the opinion. Authorized contributors may make contributions and it will be accepted, but only opinionated cohort definitions get the right tag i.e. only tagged cohort definitions are the accepted cohort definitions.

makes sense - but also ties to use case

use case can also be considered the ‘clinical description’

should start with clinical description - like we saw in phenotype phebruary

Thanks @Gowtham_Rao for rekindling this conversation, and to @edburn for the valuable feedback. I would hope we can connect this thread with all of the other prior discussions that have taken place in our community for the past attempts, if for no other reason than to avoid rehashing old discussions. Here’s one of the prior threads as a reference that should be reviewed to orient the discussion.

I think this 2021 Gigascience paper, " Desiderata for the development of next-generation electronic health record phenotype libraries" offers useful food for thought, and would encourage others to give it a solid read. Perhaps a workgroup journal club that digs into the specific assertions would be helpful. I know some of the authors are in our community, can only find to tag @lrasmussen, but hope they could chime in and contribute to that discussion.

I’ll also note that, within the design of capR, @lee_evans and @mdlavallee92 , were very strategic in trying to separate out the components that made up a cohort definition (e.g. like conceptsets and inclusion criteria) in addition to allowing for their aggregation. So, a legitimate question to tackle (at some point) is whether our phenotype library is really a library of cohort definitions (with supporting metadata) or is a phenotype resource library containing various resource types that can be configured and built into cohort definitions (inclusive of cohort definitions themselves).

There are the technical implementation bits (conceptsets, cohort definitions), which I feel good we have a good current handle of within our community. There are the evaluation bits (such as, but not limited to, CohortDiagnostics and PheValuator) which are quite promising but still maturing in form and function. And then there’s the meta-data bits, and that’s where I struggle the most. It isn’t clear to me what is required, what is a desired nice-to-have, and what is distracting noise.

My current opinion: I think having a clinical description that states the intent of the phenotype target is very important to have and to connect with all elements used to implement that target. Otherwise, in absence of a description, it seems to be impossible to precisely evaluate the implementation, because you don’t know for sure what you were trying to accomplish. All too often, we assume we know the intent based only on a phenotype label (e.g. ‘Type 2 diabetes mellitus’ or ‘Delirium’ or ‘Hemorrhagic events’) only to realize that there is more nuance to it (e.g. Type 2: does that exclude T1DM and gestational diabetes, what does it mean to be ‘incident’; Delirium: does that include general confusional state, is it restricted to only events in hospital? Hemorrhagic events: is it bleeding-related hospitalizations, bleeding in hospitals, bleeding requiring health service utilization, or all bleeds all the time?)

If a definition has been evaluated, it would be useful to capture those insights that users learned from their evaluation. I don’t think simply providing results from CohortDiagnostics is sufficient, at least not at this current state where we don’t have objective criteria that are applied to the results to determine fitness-for-use. We have tried some ‘evaluation templates’, but I know not everyone is enamored with filling them in, so I don’t know exactly what the right solution is there.

@edburn raised the possibility of trying to understand the journey through phenotype development and documentation of choices along the way. I was discussing a similar topic with @jennareps as it relates to our analysis development and evaluation process: for reproducibility purposes, I think we need only have the final process fully specified and executable end-to-end (so I don’t need to know that I iterated a few times and stumbled before getting to the starting line, I only need to know the path from start to finish). but for purposes of transparency of process and education and awareness, seeing the full meandering journey through all versions could be informative…I just don’t know how practically realistic it is to expect others to provide this information.

As we go down this path again, I think it’s important to clarify the intent and target audience of the phenotype library: Personally, I’d say let’s not focus on how to put information in, but rather on how information within can be used by others to enable reliable evidence generation. If I could log into ATLAS or start up an R session with the HADES packages loaded, and know that I could confidently re-use a resource of community-evaluated cohorts as my building block inputs to my analyses (instead of having to start each study with denovo phenotype development), I’d be a much happier researcher.

1 Like


Good idea - maybe we can take this on as part of the OHDSI Phenotype Development and Evaluation workgroup (@AzzaShoaibi )

Atlas is optimized for creating, storing and retrieving conceptsets and cohort definitions - OHDSI has invested a lot to familiarize people to use Atlas, and the success thus far with atlas-phenotype.ohdsi.org has made me confident that we should preserve this. CapR is awesome and allows a new function - storing reusable cohort definition component.

I would passionately argue that cohort definitions without an accompanying clinical description - should not be evaluated. This is because the target of the cohort definition should be the clinical description AND the evaluation is if the cohort definition yields persons that meet the clinical description during the period of time they are in the cohort! So @edburn i think the clinical description is the anchor of the opinion, specific use case and structured documentation.

Break down the definition into elements and execution results (how it performed on “database X”) should show impact of each element on cohort size. (if not at that level, the network results are less useful)

Academic credit (in N3C they discussed this) - if I spend long time helping the Phenotype library, others will use it but it will be difficult to be co-author on their paper (larger academic problem which we can’t solve but complicates the library for sure). Motivation to share

Looks like I added some contribution to the debate in the wrong place, triggered by the multiple myeloma discussion there. Then I realized my mistake. @Gowtham_Rao even predicted I will have something to complain about. Since already found guilty, let me commit my crime here (below).

I lieu of repeating everything I said there, this is just a summary and contribution to the above debate:

Conditions are elusive things, because:

  • They are captured at a rate anywhere between 0 and 100%
  • They are changing and sometimes never final,
  • They are hard to make by the physician,
  • They are often imprecise.

We have a bunch of tricks for that (repetition of diagnoses over time, addition of circumstantial evidence, elimination of alternatives and implausibilities, re-diagnosis), which we combine using Boolean and temporal logic. And we mix them with the the true inclusion criteria belonging to a study design (“age>=18”).

The intention is noble, but it is still ugly and black box situation not worthy of OHDSI. We should at least get clean what we are doing and why in the documentation.

An example of “elimination of alternatives” because we don’t trust the simple diagnosis code.

A definition of the use case, meaning, any acute situation resulting from the disease.

Example of circumstantial evidence to infer some level of severity.

Also circumstantial evidence, except it is not even clear for what purpose. It’s clearly not the diagnosis itself, or the severity.

We could learn the effectiveness of our tricks, but only if their purpose is defined.

I like that idea. We could even use full phenotypes as part of other phenotypes and nest them.


Hi Christian @Christian_Reich, thank you for that insightful post. I think many of the issues you raise are derived from challenges that are upstream to phenotyping, such as information loss between structured and unstructured data and variability in content and design of databases within OHDSI.

In addition to the issues you have raised, another one is that there are important clinical variables that do not map well to structured data. For example, the dimensions of a tumor, number of nodes or metastatic sides, molecular subtype, oncotype, or disease stage are not in structured data. Furthermore, there are important data that are not well captured even in unstructured formats. For example, ventricular arrhythmias can be underdiagnosed, especially when they occur when the patient is not present in a hospital or other healthcare facility. The OHDSI community has researchers who have been pursuing solutions to these limitations, both individually and in collective efforts such as the NLP working group.

The information loss issues are augmented by the fact that there can be wide variability in the structure and content of OHDSI databases, as well as practice patterns at each site. Consequently, the inconsistencies in phenotype metrics across databases, which are measured by cohort diagnostics, can be independent of the phenotypes themselves. Some well-designed EHR phenotypes have suboptimal performance on claims databases and vice versa because of differences in structure and content of the databases. Therefore, we should be cautious about labeling cohort diagnostics as a “validation” tool. Instead, we should use it for its intent as a diagnostic tool to help refine cohort definitions. Concurrently to researching methods to reduce information loss and improve the rigor and reliability of phenotyping, we should consider the limitations of our tools and data when designing studies.

I agree that we should document metadata on phenotypes. We (myself, Chunhua @chunhua, Cong @Cong_Liu and Karthik @cukarthik) recently started this effort of metadata on phenotypes within eMERGE. It would be good to have OHDSI collaborators. Please let me know if you’d like to collaborate.

Hi all,
great to see that the structure of the phenotype library is being actively discussed!
I agree with the comment from @Vojtech_Huser

The credit or the lack of credit for the large amount of work that’s behind developing and evaluating a phenotype, as we have seen in every one of the phenotype phebruary posts, is a limitation to collaborate in this community effort.
A couple to ideas to address this:

  1. Target a publication for selected phenotypes developed with the alternatives that were considered and the rationale to select the proposed algorithm and its validation. I consider validation key to publish the results, and I know that Validation itself is a topic of discussion (I’ll add some thoughts in a separate post). Publication of the individual phenotypes would be additional to the proposals made by @david_vizcaya in our last Phenotype Development and Evaluation workgroup meeting to publish the lessons learned from the Phenotype Phebruary initiative and the great idea from @hripcsa and @Patrick_Ryan to publish the phenotypes in the book of OHDSI.

  2. Develop R and Jason packages for each phenotype that has been validated with good results. This will allow to give credit to the team who developed the phenotype, and to make sure the community uses the most updated version as the definition/algorithm gets updated. Here we would need input from experienced people in the community in creating R packages @Gowtham_Rao, @schuemie, @lee_evans, @mdlavallee92, etc., if this is feasible, scalable given the requirements to maintain, etc.

It would be great to hear what others think about this and other ways to make sure we give the appropriate credit to the teams developing phenotypes.

One way to do that would be to have a convention that all OHDSI research deposits cohort definitions of a study into the phenotype library as a default, and quotes it that way in publications (peer reviewed or not). If we all do this, we will create the currency around those phenotypes. Otherwise it is just an altruistic effort.

Agree. We have the infrastructure to do that

I think the difficulty here is the potential scope of the initial “product” being described, which presents rather a massive task to create and, as importantly, maintain. Perhaps an alternative way to look at this is to think about what is the minimum viable product for the OHDSI phenotype library?

For me, this could be starting with the “library” role, rather than getting involved in the “editorial” or “author” roles. This might then look like:

  • A collection of all the cohort definitions used in previous studies that have used the OMOP CDM.
  • With no claim that these definitions are perfect or even up to date, but that this is a central repository of definitions that have been used before.
  • Simply a place where the cohort definitions from LEGEND, studyathons, and so on, are brought together into a single place and appropriately indexed
  • This is a relatively simple improvement on the situation we have now where cohort definitions are in various github repos under various organisations.
  • Requiring publication as a barrier to entry would mean that any used would have an associated citation (= academic credit) and have gone through peer review (although we can certainly debate whether this means the definitions are any better for that)

Once something like this is in place, you could then tackle the thornier downstream roles, like how to appropriately define phenotypes (i.e. “author”) and how to assess their quality (i.e. “editorial”).


We do? Fully “on-ramped”? One-click from Atlas (“Upload this cohort to the phenotype library”)? Integration with CohortDiagnostics?

Well - yes - if we use a loose definition

  1. Store definitions collaborative - atlas-phenotype.ohdsi.org

  2. Role based access control - managed by OHDSI. A form system to request access

  3. Catalog and retrieve definitions - using tag - atlas-phenotype.org

    - @edburn this maybe the librarian/editor role with tags like ‘approved’, ‘peer reviewed’, ‘used in study’ etc

  4. Make a study package for the OHDSI Phenotype library that has all the cohorts - its at ohdsi github. Is version controlled. Maybe used by other HADES packages AND by OHDSI studies.

  5. Cohort Diagnostics output will be at data.ohdsi.org

  6. Tool is being enhanced to allow ppl with access to atlas-phenotype.ohdsi.org to add comments/annotations in a collaborative fashion. Same with atlas-phenotype.ohdsi.org (about 3 months).

  7. Forums - as demonstrated during phenotype phebruary may be a collaborative platform - that is retrievable by google search

1 Like

Based on the feedback thus far - the design i just specified should be sufficient

Really glad to see this effort coming back. I would just add to this:

It’s never a “failure”, we’ve just learned along the way :slight_smile: I do agree with @Patrick_Ryan’s early comment about trying to avoid rehashing old discussions, but would love to know if we have an idea what didn’t work in the past, just to learn from those experiences.

Not sure if/how I could contribute to this renewed effort, but would love to see it come to fruition. Am very supportive of the work @mattspotnitz noted within eMERGE and how that could maybe help here too.


This is my current thinking - try to keep it simple, familiar and usable.

I like this workflow @Gowtham_Rao , but I would like it more if the numbering system that is used to organize information in that GitHub repo was more communally documented and understood. The nesting of the look-up here is a very OHDSI secret handshake kind of thing.

I’d encourage we consider being a little more redundant with documenting our organization schema. It would go a long way in increasing the utility of what’s hidden in the repo.

Actually - i was thinking of making this very simple. The organization will all happen in Atlas at atlas-phenotype.ohdsi.org . People sign up with a forum set up by @lee_evans and get access to either read/create cohort definitions based on their “trust” levels. Everyone gets review them and we democratically accept/vote up/down cohort definitions using a process (similar to CDM workgroup for example) and then the up voted cohorts get a Atlas tag – “Accepted” etc.

A snap shot gets pulled and put into Github – it will be versioned and become the release. That will be HADES compatible and referential in any OHDSI study.

So we can do all this in full transparency - using existing tools that everyone is familiar.

1 Like