OHDSI Home | Forums | Wiki | Github

Requirements Development for the OHDSI Gold Standard Phenotype Library

(Luke Rasmussen) #61

@apotvien - I know OHDSI has a lot going on, and fortunately @psbrandt alerted me to this very interesting workgroup. I need to do some reading of the materials/discussions so far, but wanted to ask if you had discussed collaborating/aligning with the Phenotype KnowledgeBase (https://phekb.org/) that was developed as part of eMERGE? It would be great to see if there is an opportunity to at least share experiences across these projects. Thanks!

(Aaron Potvien) #62

@lrasmussen, thank you for your post. We would absolutely be interested in sharing experiences and learnings with the PheKB community. The substantial amount of development that PheKB has done is definitely relevant here. We would be very interested in hearing about the governance processes PheKB uses for phenotype design and evaluation. An information exchange at a future meeting would be great!

(Aaron Potvien) #63

Hi All,

I’m looking forward to our meeting tomorrow morning (10am EST). I’ll be presenting about one possible framework for the library architecture/implementation.

It is up to date on the Wiki and WG Meeting Document, but just in case, we have changed the link for our meeting space to be the following one:

(Seng Chan You) #64

I really wanted to attend the meeting, but I couldn’t stay up yesterday.
I’m sorry, @apotvien .

I saw the slides you posted. Though I might not understand the whole thing, it is really really fascinating!

I think I can participate as a secondary librarian and validator for a while (of course can be a user).
After setting up the environment, the algorithmic evaluation (eg Phevulator) can be automatically validate and produce a metadata for the validation results.
The json file for the phenotype should be comprehensive for the two phenotyping softwares, ATALS (rule-based) and APHRODITE (computable).

(Aaron Potvien) #65

No worries, @SCYou! We all understand that a midnight meeting time is not exactly optimal. :slight_smile:

Thanks for reviewing the slides. I’m still thinking about this a high level. I used PheValuator as an example, but the framework is flexible enough to encompass manual chart review too, and in principle, whatever “Gold Standard” design/evaluation practices this group ultimately adopts. Likewise, I also used JSON as an example export, but that may not be appropriate for exporting computable phenotypes. Nonetheless, if the implementation can be represented as a single file (even if it’s an archive of multiple files), it can still be hashed; then, all of the metadata that pertains to that phenotype (definition, validation, user experiences, etc.) can always be linked to that hash without any ambiguity about what definition was being referred to.

The group had some great suggestions yesterday that I’m taking back to the drawing board! We’ll need a way to incorporate user experiences so we can track what worked or didn’t work over time, as people try the phenotype out; this doesn’t necessarily have to be as formal as a validation set. We’ll also want a way to search and filter results according to phenotype performance and also the CDM elements used in its construction. I’ll also need to start filling in more of the data elements in greater detail. I’ll be in touch later on with updates!

(Seng Chan You) #66

I agree, @apotvien

The json you chose for the exporting format and the hashing module is the most fascinating part in your slide. It was really brilliant! :slight_smile:

Also, I think we don’t need redundant phenotyping works in the community. I’m looking forward to what you’ll update!

(Aaron Potvien) #67

Good morning (or evening :slight_smile: ) folks,

For our meeting this coming Tuesday, I’m looking forward to sharing a prototype Shiny app I’ve been developing. Moving in the direction of the Shiny/GitHub framework that was discussed last week, the app expectantly gets us closer to an interface which connects users to the library phenotype entries.

I hope that this will help to generate discussion about additional data elements we would like to see, how/where they should be displayed, features we wish it had (or perhaps didn’t have), how it should look, etc. The emphasis of work so far has been on the UI portion, so there’s plenty of room for adaptations at this stage. Thank you, and I’m looking forward to your feedback!

(Seng Chan You) #68

Great @apotvien
I cannot find the Shiny App in Phenotype github. Do you mean you’ll make the Shiny App public at the next meeting?

(Aaron Potvien) #69

Hi @SCYou, that’s correct that it isn’t there. I haven’t pushed anything out just yet, as it’s not quite ready, but I should be able to do so soon after our meeting.

(Seng Chan You) #70

@apotvien Ok, I’m looking forward to it!

(Aaron Potvien) #71

All, please find the link below to use for tomorrow’s meeting (at 10am ET):

(Aaron Potvien) #72

Folks, thanks for the great discussion this past Tuesday! As a follow-up, I’m looking forward to teaming up to create the perfect templates for:

  1. An Author (Submitter) of a phenotype (Link Here)
  2. A Validator of a phenotype (Link Here)

What are the key data elements that should be included? Which elements are mandatory vs. optional, and can the elements be kept general enough to be reasonably expected to be filled in each time yet specific enough to be meaningful?

To help with brainstorming, I’ve started a separate Google Doc for each one of these, starting with the template I used in the Shiny app demo. However, the template I used was largely intended to be illustrative, so now is a good time to dive deeper into the details. Please contribute your ideas to help build the perfect templates, and thank you in advance for your contributions!

From an implementation perspective, I think our templates can eventually exist as Markdown (or R Markdown) documents, which can be flexibly rendered inside a viewer program such as the Shiny app and directly on the phenotype’s GitHub page. However, for ease of editing, we should be able develop them as Google Docs for the time being.

(Aaron Potvien) #73

Good morning,

Today at 10am ET, we’ll be reviewing and working through developing the documents in the post above: The Authorship and Validation templates to identify the key data elements required of each entry.

Please find the link to today’s meeting below:

(Aaron Potvien) #74

Hello everyone,

It’s been a while since the phenotype viewer application was first introduced at one of our biweekly meetings. Since then, I have been pursuing a way to make this application and underlying source code more functional and available to everyone. After working through some nuances of locally hosting the app versus running it in a deployed setting, I’m happy to say that a version stable enough to share is finally available here:


The repo for the code is here:

A special thank you goes out to @lee_evans and @schuemie for making the hosting of this app possible!

I invite you to play around with the application and to provide any feedback (Yes, really – any feedback… I can take it! :smirk:). Please keep in mind that this is far from a finished product since we’re still at a conceptual stage; a lot of features are not “hooked up” in the sense that they all refer to the same templates, and when data are used, it is randomly generated.

Nonetheless, I strongly believe it’s helpful to have something concrete to look to at in order to help generate discussion and ideas. What do/don’t you like? What else should be included or tweaked? Thanks for your attention, and I hope this initial launch acts as a stepping stone on the path to an improved library!

(Aaron Potvien) #75

Hello everyone,

For tomorrow’s meeting, @juan_banda will be talking to us a bit about FAIR definitions and how they pertain to phenotyping.

As time permits, I’ll also walk through a status update and overview of upcoming objectives with respect to the architecture/implementation piece of the library.

The meeting link is the same one as before, here:

(Aaron Potvien) #76

Good morning folks,

For tomorrow’s meeting, I’d like to talk about how the phenotype and validation data are being organized in the repository (link) and how that data can be assembled for use in the viewer application.

I’ll attempt to walk through a live example so we can see the process starting to take shape from entering a new phenotype with validation sets into the library to seeing that data reflected in the viewer application.

For reference, the meeting link is below (10-11am ET):

(Joel N. Swerdel) #77

If we have time I’d like to go through the data elements our group thinks should be included in the phenotype library.

(Aaron Potvien) #78

Hello all,

Please find the link to tomorrow’s meeting below (10-11am ET):

I’d like to reattempt the walkthrough I had planned for last time; I apologize for the technical difficulties preventing me from doing so two weeks ago. I anticipate this walkthrough will automatically generate further discussion about the data elements of the library and feedback about how they are currently stored and organized in the repository.

(Aaron Potvien) #79

Hello everyone,

Thanks for the energizing discussion this morning. One item we wanted to reach out to this community about is regarding the aspects of the phenotype that are not currently captured by the phenotype JSON template (example attached populated with dummy data).

Specifically, the “Purpose and Intended Use” section might be better broken down into smaller components. These components could be themes that are commonly characteristic of phenotypes. For example, definitions often critically rely on a lookback period, can be intended to target incident or prevalent cases, or might be best suited for use in a comparative effectiveness study.

Do others have ideas for themes along these lines that we could draw from to better broadly characterize phenotypes in a standardized way?


(Ray King) #80

Children versus adults
Individual versus population, e.g., in reviewing diabetes phenotypes for a project, one from Mayo was overly sensitive in detecting potential diabetics prior to surgery versus one that might balance metrics to characterize a population.

Thanks for letting lurk on your call today.