OHDSI Home | Forums | Wiki | Github

Brainstorming on how to build the OHDSI Phenotype Library as a community resource

Hi all,
great to see that the structure of the phenotype library is being actively discussed!
I agree with the comment from @Vojtech_Huser

The credit or the lack of credit for the large amount of work that’s behind developing and evaluating a phenotype, as we have seen in every one of the phenotype phebruary posts, is a limitation to collaborate in this community effort.
A couple to ideas to address this:

  1. Target a publication for selected phenotypes developed with the alternatives that were considered and the rationale to select the proposed algorithm and its validation. I consider validation key to publish the results, and I know that Validation itself is a topic of discussion (I’ll add some thoughts in a separate post). Publication of the individual phenotypes would be additional to the proposals made by @david_vizcaya in our last Phenotype Development and Evaluation workgroup meeting to publish the lessons learned from the Phenotype Phebruary initiative and the great idea from @hripcsa and @Patrick_Ryan to publish the phenotypes in the book of OHDSI.

  2. Develop R and Jason packages for each phenotype that has been validated with good results. This will allow to give credit to the team who developed the phenotype, and to make sure the community uses the most updated version as the definition/algorithm gets updated. Here we would need input from experienced people in the community in creating R packages @Gowtham_Rao, @schuemie, @lee_evans, @mdlavallee92, etc., if this is feasible, scalable given the requirements to maintain, etc.

It would be great to hear what others think about this and other ways to make sure we give the appropriate credit to the teams developing phenotypes.

One way to do that would be to have a convention that all OHDSI research deposits cohort definitions of a study into the phenotype library as a default, and quotes it that way in publications (peer reviewed or not). If we all do this, we will create the currency around those phenotypes. Otherwise it is just an altruistic effort.

Agree. We have the infrastructure to do that

I think the difficulty here is the potential scope of the initial “product” being described, which presents rather a massive task to create and, as importantly, maintain. Perhaps an alternative way to look at this is to think about what is the minimum viable product for the OHDSI phenotype library?

For me, this could be starting with the “library” role, rather than getting involved in the “editorial” or “author” roles. This might then look like:

  • A collection of all the cohort definitions used in previous studies that have used the OMOP CDM.
  • With no claim that these definitions are perfect or even up to date, but that this is a central repository of definitions that have been used before.
  • Simply a place where the cohort definitions from LEGEND, studyathons, and so on, are brought together into a single place and appropriately indexed
  • This is a relatively simple improvement on the situation we have now where cohort definitions are in various github repos under various organisations.
  • Requiring publication as a barrier to entry would mean that any used would have an associated citation (= academic credit) and have gone through peer review (although we can certainly debate whether this means the definitions are any better for that)

Once something like this is in place, you could then tackle the thornier downstream roles, like how to appropriately define phenotypes (i.e. “author”) and how to assess their quality (i.e. “editorial”).


We do? Fully “on-ramped”? One-click from Atlas (“Upload this cohort to the phenotype library”)? Integration with CohortDiagnostics?

Well - yes - if we use a loose definition

  1. Store definitions collaborative - atlas-phenotype.ohdsi.org

  2. Role based access control - managed by OHDSI. A form system to request access

  3. Catalog and retrieve definitions - using tag - atlas-phenotype.org

    - @edburn this maybe the librarian/editor role with tags like ‘approved’, ‘peer reviewed’, ‘used in study’ etc

  4. Make a study package for the OHDSI Phenotype library that has all the cohorts - its at ohdsi github. Is version controlled. Maybe used by other HADES packages AND by OHDSI studies.

  5. Cohort Diagnostics output will be at data.ohdsi.org

  6. Tool is being enhanced to allow ppl with access to atlas-phenotype.ohdsi.org to add comments/annotations in a collaborative fashion. Same with atlas-phenotype.ohdsi.org (about 3 months).

  7. Forums - as demonstrated during phenotype phebruary may be a collaborative platform - that is retrievable by google search

1 Like

Based on the feedback thus far - the design i just specified should be sufficient

Really glad to see this effort coming back. I would just add to this:

It’s never a “failure”, we’ve just learned along the way :slight_smile: I do agree with @Patrick_Ryan’s early comment about trying to avoid rehashing old discussions, but would love to know if we have an idea what didn’t work in the past, just to learn from those experiences.

Not sure if/how I could contribute to this renewed effort, but would love to see it come to fruition. Am very supportive of the work @mattspotnitz noted within eMERGE and how that could maybe help here too.


This is my current thinking - try to keep it simple, familiar and usable.

I like this workflow @Gowtham_Rao , but I would like it more if the numbering system that is used to organize information in that GitHub repo was more communally documented and understood. The nesting of the look-up here is a very OHDSI secret handshake kind of thing.

I’d encourage we consider being a little more redundant with documenting our organization schema. It would go a long way in increasing the utility of what’s hidden in the repo.

Actually - i was thinking of making this very simple. The organization will all happen in Atlas at atlas-phenotype.ohdsi.org . People sign up with a forum set up by @lee_evans and get access to either read/create cohort definitions based on their “trust” levels. Everyone gets review them and we democratically accept/vote up/down cohort definitions using a process (similar to CDM workgroup for example) and then the up voted cohorts get a Atlas tag – “Accepted” etc.

A snap shot gets pulled and put into Github – it will be versioned and become the release. That will be HADES compatible and referential in any OHDSI study.

So we can do all this in full transparency - using existing tools that everyone is familiar.

1 Like

Works. :slight_smile: I am just very tired of my study-specific secret armbands. Very tedious to keep track of all our secret numbering systems. This would be a massive improvement.

no secret numbers - all cohorts have the number that this atlas-phenotype.ohdsi.org gives. That cohort id will become immutable and be like a standard concept id in OMOP. So forever and ever till the end of OHDSI (which will never come) – the cohort id 14 will always be the same

For ever and ever - cohort id #14 will be Myalgia and this link will always give the same cohort definition


1 Like

I love the idea of an immutable and globally unique cohort identifier. What if we used a hash of the cohort json instead of or in addition to the ID in atlas-phenotype.ohdsi.org? That way if two hash IDs match you know for sure that the cohort definitions match.

Also I really like @Christian_Reich’s comments about conditions. My two cents is that there is no way around considering the data generating processes in rule-based phenotype algorithms. e.g. Logic to exclude rule out diagnoses does not make sense for registry data but does make sense for claims.

Thanks @Adam_Black

sure - do you have thoughts on a standard OHDSI way:

  • maybe a standard hash algorithm that is used in all our software?
  • an agreement on whether we do it on Cohort JSON or Cohort SQL (probably SQL as we may have cohort definitions that written in custom SQL not conformant to OHDSI Circe).
  • An R function implementation of this - maybe in some HADES package like Cohort Generator?

The usable cohort definitions are maintained on ATLAS as shown here

1 Like

I think in the short term simply using the ID in phenotype Atlas makes sense and is already implemented. In the longer term maybe we have URIs for concepts, concept sets, and cohort definitions.

@callahantiff - It seems like a cohort definition library fits nicely into the idea of the semantic web.

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

Do you think ideas in the semantic web community (e.g. How we identify things, URIs) could be applied here?

1 Like

Hey @Adam_Black!

Very interesting observation and a very good use of the Semantic Web, maybe even more specific the FAIR (findable, accessible, interoperable, and reusable) data principles. I think I might need to learn more about what exactly the goals are? Are you trying to set-up a framework to make things more easily accessible and findable or are you strictly interested in a way to assign permanent resolvable URIs?

It’s also entirely possible that I am missing the larger goal altogether, so please feel free to let me know :smiley:

Based on the discussions in the community -

  1. ATLAS will be the official source of cohort definitions for the OHDSI Phenotype library (ready and available to community)
  2. The OHDSI Phenotype development and evaluation workgroup will be responsible to add/edit/remove cohort definitions to the atlas instance (ready and available to community)
  3. A Cohort Diagnostics study package will be maintained here https://github.com/OHDSI/PhenotypeLibrary/ which will be version controlled and referenceable. (ready and available to community)
  4. A shiny application with results will be posted on https://data.ohdsi.org/PhenotypeLibrary/ (to do)

Please see post here OHDSI Phenotype Library announcements