OHDSI Home | Forums | Wiki | Github

Requirements Development for the OHDSI Gold Standard Phenotype Library

(Aaron Potvien) #1

Greetings all,

Here at Georgia Tech, we’ve put together an initial requirements development document to help envision how an OHDSI Gold Standard Phenotype Library would function. To make the requirements more tangible, we put together a series of potential personas (representing different types of OHDSI stakeholders and collaborators), as well as use cases that capture the essence of what these users are trying to accomplish. The phenotype library is intended to be a home for validated, high-quality cohort phenotypes that can be generated using the OMOP CDM.

The full document can be found here on Google Docs. We would greatly appreciate your input on additional personas and/or use cases that typify other characteristics you believe should be represented in order to make the OHDSI Gold Standard Phenotype Library a valuable new resource for the entire community.


P.S. Here is an example persona:

Persona: Tom, PhD Health Services Researcher
Tom recently completed a PhD in population health sciences with a minor in biostatistics. He is now a postdoctoral researcher at a large research university, with the goal of becoming a professor after he completes his postdoctoral studies. His primary research interest is in improving health outcomes for patients with diabetes mellitus. Tom is well-versed in medical terminologies and has a solid statistical background. He often works with clinicians as part of his research. Tom’s lab has an Atlas instance running which he uses sometimes but also does direct SQL and R-based analyses with OMOP.

What is a phenotype in the context of observational research?
Atlas/WebAPI Working Group Update
(Seng Chan You) #2

Thank you for leading this great work, @apotvien.
I added my Persona to the document.

(Aaron Potvien) #3

Thank you kindly for the additional persona, @SCYou.


(Seng Chan You) #4

Hello, colleagues,

While building some analytic package by using OHDSI ecosystem, I realized the necessity of OHDSI Gold Standard Phenotype Library, again.

Is there anyone who has developed ‘gold standard phenotype library’ for cardiovascular outcome, such as stroke, myocardial infarction or sudden cardiac death?

If there isn’t, I want to build phenotype library for cardiovascular outcome.

There are many paper validating the outcome in ICD-9 or ICD-10 code system (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135834). Though it would not be perfect, I want to leverage previous papers to build phenotype library for OHDSI. If there is any other opinions, please let me know.

(Martijn Schuemie) #5

@jon_duke: I think you held a ‘rally the troops’ talk about the Phenotype Library at the community call recently (I was unfortunately unable to attend).

I will not ask you to dial into the PLE Eastern Hemisphere meeting for a repeat performance (although you’re welcome to, it’s at 2am your time), but if you can share your talking points / slides I’d be more than happy to keep folks updated in the eastern hemisphere.

(Pavel Grafkin) #6

@SCYou, we are going to kick off development of Design (assets) repository quite soon. If you could check whether the proposal satisfies all your needs and provide feedback - would be great!

(Nataly Patino) #7

I have a created code library for those outcomes validated with literature, encoderpro code search and basic text search in code descriptions. I’d be happy to help. how do I reply with that document?

(Jon Duke) #8

Glad to see the topic heating up! @schuemie, due to holidays, @apotvien and I will not be presenting at the community meeting until January. But this topic is clearly a high priority. I like @pavgra’s thinking around the digital assets more broadly, but I would agree with @schuemie that I’m not totally sure whether Athena should be home for the actual assets.

For the moment, @apotvien and I have posted a few demonstration entries on the OHDSI Wiki for people to review and provide feedback on what contents should be included. Find the links to the main page and a sample page here. The idea is that the actual content lives on GitHub or Atlas, while the wiki contains the metadata.

The larger issue is we’ve got to get all of these efforts pulled together! @pavgra Are you available to join our WG call at 10am ET this Weds? Anyone interested in this topic, if possible, would be great to have you join to discuss.



(Jon Duke) #9

I have updated the invite on the page, thanks for the heads up @schuemie. Would also be great if @Ajit_Londhe @SCYou, @Juan_Banda, @jswerdel, @Nataly_Patino and others who are interested in the topic are able to join.

(Anthony Sena) #10

@jon_duke after speaking with members of our working group, we decided to cancel the Atlas/WebAPI working group for 11/14 so that we can also attend this discussion as well. Looking forward to it!

(Juan M. Banda) #11

Is the call today, November 13th, or Wednesday November 14th. I am a bit confused about this. Thanks!

(Jon Duke) #12

Updated sorry. Weds 11/14.

(Martijn Schuemie) #13

Just adding to the frenzy of discussions on this topic, here are my thoughts (that nobody asked for :wink: ) on the requirements for the Phenotype Library. This is mainly inspired by @SCYou’s recent work on creating standard phenotypes for cardiovascular disease :

Types of phenotype definitions to support:

  • Rule-based phenotypes
  • Computational (probabilistic) phenotypes

We should be able to have multiple definitions per phenotype (e.g. ‘stroke broad’, ‘stroke narrow’).

Phenotype definitions should be clearly versioned.

Operating characteristics
For each definition we need to know its operating characteristics (sensitivity, specificity, PPV, NPV).

There are multiple ways to compute these characteristics, e.g.:

  • Manual chart review
  • Joel’s algorithm

Note: there’s even a need to compute operating characteristics per subgroup (e.g. within exposure groups) to quantify differential misclassification. Joel’s algorithm can do that (we tried), but I’m not sure if and where we need to store it in the Phenotype Library.

Meta data
Each definition should have meta data, such as literature references, study references (which definition was used in which study), rationale of the definition, copy-paste descriptions of the definitions to use in the protocol and paper.

Easy to add and maintain definitions, and request evaluations across the OHDSI network.

Persistence + security: Some way to make sure definitions aren’t changed without notice by unauthorized persons.

I would like an API for getting things out. Ideally, I would want to be able to plug one or more definitions directly into my study package, so it can be executed at each study site.

What is a phenotype in the context of observational research?
(Frank DeFalco) #14

My $.02. Digital signing, encryption options, integration with ORCID.

(Christian Reich) #15

Does that exist somewhere, @jswerdel?

(Joel N. Swerdel) #16

Right now it only exists locally. We’re working on putting the package together.

(Seng Chan You) #17

I’ve made first draft for ischemic stroke cohort in the public ATLAS
I used ancestor concept id of 4043731 (infarction-precerebral) and 443454 (Cerebral infraction) to define stroke. And added specifiers: Inpatient, primary or secondary diagnosis, or primary diagnosis only.

You can see the result of this cohort in our database.

I didn’t add ‘excluding migraine at the same day’ or ‘imaging study of brain’. Because excluding migraine at the same day looks weird and so contrived. And I don’t think vocabulary for procedure is fully standardized across OHDSI network.

Do you have any comments on this, @schuemie @Rijnbeek @Christian_Reich @Patrick_Ryan?

I’ll start to validate the PPV for each level of specifier using discharge summary from EHR. Could you join to validate this cohort in ICD-9 system, @hripcsa @rchen?

@jswerdel, Can you validate this cohort by using PheVulator?

(Jon Duke) #18

Meeting to continue the discussion on the gold standard library today at 10am ET.

Invite Here

Minutes from last meeting (11/14)

(Jon Duke) #19

Trouble for some dialing in today. We are switching to WebEx.

OHDSI Gold Standard Library
Hosted by Jon Duke

Wednesday 10:00 am | 1 hour | (UTC-05:00) Eastern Time (US & Canada)
Occurs every 2 week(s) on Wednesday effective 11/28/2018 from 10:00 AM to 11:00 AM, (UTC-05:00) Eastern Time (US & Canada)
Meeting number: 739 830 648


Join by video system
Dial 739830648@gtriconf.webex.com
You can also dial and enter your meeting number.

Join by phone
1-240-454-0879 USA Toll
Access code: 739 830 648

(Nataly Patino) #20

I just saw the post that we are having meetings on this topic. I have added the webex invite to my outlook calendar and will join next time.