Requirements Development for the OHDSI Gold Standard Phenotype Library

Christian_Reich · April 24, 2019, 9:04am

What about:

New intervention (could be drug, device, procedure)
First intervention
Incident intervention
Prevalent intervention
New onset of condition
First onset of condition
Incident onset of condition
Prevalent condition

The Developmet_Methodology also needs categories in my opinion. Otherwise people indeed will write such self-promoting sentences like “This phenotype was developed by a group of 3 expert endicrinologists…”

Sorry to come in late here. Do you have a controlled vocabulary for the other ones as well? Like Modality, Provenance_Reason? What’s Provenance?

Also, “Uses_Labs” should be “Uses_Measurements”. We should use standard OMOP Domains.

apotvien · April 25, 2019, 3:10pm

Hi @Christian_Reich,

Thank you for the suggestions and feedback… especially for providing it free of self-promoting sentences.

With respect to Modality, we were alluding to differentiating between rule-based (heuristic) phenotypes versus computable (algorithmic) phenotypes.

Provenance is intended to capture how phenotype definitions evolve over time. A simple example is versioning. Supposing we have a phenotype with versions 1, 2, and 3, we would have the provenance capture that version 3 came from version 2, and version 2 came from version 1. This doesn’t have to be one-to-one though. By having each phenotype identify its ancestor(s), we could navigate a graph to show how a given phenotype developed.

This has implications when it comes to validation. With each change in version comes a potential change in the algorithm’s performance. Accordingly, we’ve decided to anchor the validation sets to whatever version the validation refers to. The validation sets do not “carry forward” to protect the user from automatically assuming, for instance, that if version 2 performed well, then version 3 must perform the same.

Now, it could be that that’s true if the change in version was minor, but what constitutes such a “minor change” is difficult if not impossible to establish in a general framework and would likely have to be considered on a case by case basis. However, all of the information would be available to the user to do so.

Christian_Reich · April 26, 2019, 12:12am

Don’t try me!!!

Got the Modality, got the Provenance (even though if you mean version you may just call it Version, and then like in Wikipedia introduce “predecessors”. You know more about it, but I am not sure if this kind of pedigree is really clean. I think people just open a phenotype and then start futzing around. And before they know it they created something else, without really caring for the evolutionary path. I may be wrong, though.)

But what is Provenance Reason?

apotvien · April 29, 2019, 3:29pm

Yes, I think “Version” is an example of “Provenance”, but “Provenance” doesn’t necessarily always mean “Version”. A phenotype might be directly derived from another, or it might borrow concepts from another, or it might just be “inspired by” another (like a “See Also” situation). I invite others with clinical experience to give other examples of the types of Provenance that could be documented. In our current line of thinking, it very much aligns with the idea of a “Predecessor” in the sense of being connected via a directed graph.

That can be one motivating point for the library’s existence. If a phenotype is truly created with the “gold standard” practices, then at minimum, the author will have to fill out these elements causing them to consider and document what it is they are creating and why. The job of the librarians would be to verify these elements are documented before accepting its addition into the library.

The Provenance Reason and Provenance Hash were intended to act as parallel arrays. In the picture, the first hash identifies the phenotype, and the first reason corresponds to the provenance concept. It’s similar with the second hash and second definition. Admittedly, there’s probably a more “JSONic” way to document them as being paired together in a single object, so that’s certainly subject to change. That’s related to another point brought up at the last meeting, which is how these JSONs will come to be. We’re still working on that, but one idea is to have a form that can be filled out which automatically creates this object in the necessary format.

apotvien · May 6, 2019, 3:45pm

Hello everyone,

I’m looking forward to continuing our discussion tomorrow. I’ll share a brief update about the current state of visualizing provenance in the viewer application.

By having each entry track its descendents, it’s possible to construct a graph that represents where the currently selected phenotype falls within the context of its full evolutionary path. More specifically, it’s possible to affix a cluster ID to each entry based on the connected components of the graph (all ancestors/descendents that ever had a connection to the phenotype, directly, or indirectly) and plot that cluster.

This provides for a rather interesting opportunity to convey a lot of information visually when plotting the graph cluster. For instance, nodes/edges can have shapes/colors/sizes taken to mean different things. I’m hoping the group can help come up with ideas about how to best structure this and comment on what features would be useful.

As is always the case, if others have agenda items they would like to see for this upcoming meeting or any of our future meetings, please don’t hesitate to share! Thank you very much!

Link to tomorrow’s meeting below (10-11am ET):
https://gatech.webex.com/gatech/j.php?MTID=mdd4af3e9b84212fc7df3eb0150703df5

apotvien · June 3, 2019, 3:27pm

Hello everyone,

For our meeting tomorrow, I would appreciate this working group’s input to review an updated version of the Common Data Elements template – the data elements required for an entry to be accepted into the library. There are currently two types of entries: 1) A phenotype algorithm; 2) a validation set (metrics regarding a phenotype algorithm’s performance). We’ll require different data elements for each type of entry.

Defining the characteristics that constitute a library entry is a substantial part of defining what we mean by a “Gold Standard Process”.

apotvien · June 4, 2019, 1:27pm

For reference, the working document is the following:

https://docs.google.com/document/d/1H_fG94uGhRsY2-aC4j18PTYeIBCXF-tSOSP8m_B4SVE/edit

And the meeting link for this morning’s meeting is below (10-11am ET):

https://gatech.webex.com/gatech/j.php?MTID=mdd4af3e9b84212fc7df3eb0150703df5

lrasmussen · June 4, 2019, 1:37pm

@apotvien - unfortunately I have a conflict for today’s meeting. What’s the best way to get you feedback - is it okay to just add comments in the Google doc, or do you want to track feedback in this thread?

apotvien · June 4, 2019, 3:29pm

@lrasmussen, thanks in advance for your input! Feel free to leave any comments you have in the document. We’ll need to continue reviewing it next time, especially the parts in red that still need to be worked out.

apotvien · June 4, 2019, 3:42pm

All,

Since there are so many aspects of the library that need to be discussed, we’ve found that meeting for an hour every other week has not really been sufficient in getting through enough material. We’ve decided to ramp up the WG meetings a bit, in terms of duration and frequency, at least for the near future.

Please note that our next set of meetings will be as follows:

6/11, 9-11 ET
6/18, 9-11 ET
6/25, 9-11 ET
7/2, No Meeting

As always, all are welcome to attend to help shape the library’s future. Even if it’s only possible to join for a portion of the two-hour time slot, please feel free to do so.

apotvien · June 10, 2019, 2:24pm

Hello everyone,

As I mentioned in the last post, we’ll be starting our WG meeting tomorrow morning an hour earlier than usual. I’m hoping we can establish a working set of data elements required when submitting a cohort definition (chapter) into the library. When you look at a brand new cohort definition you haven’t heard of or seen before, what would you expect to see documented? What would you like to know about it? Feel free to post here or add comments to the document.

Our work-in-progress documentation is here:

https://docs.google.com/document/d/1H_fG94uGhRsY2-aC4j18PTYeIBCXF-tSOSP8m_B4SVE/edit

The meeting link for tomorrow is below:

https://gatech.webex.com/gatech/j.php?MTID=mdd4af3e9b84212fc7df3eb0150703df5

apotvien · June 28, 2019, 11:41am

Hi All,

This is a reminder that there will be no WG meeting next week. I believe this is our first cancellation since we’ve kicked off these meetings in January, so kudos to this group for maintaining such great momentum!

The following weeks, we’ll modify our meeting schedule to have weekly meetings for an hour (instead of 2). So, starting July 9th, we’ll meet weekly from 10-11am ET (Webex link below).

https://gatech.webex.com/gatech/j.php?MTID=mdd4af3e9b84212fc7df3eb0150703df5

Juan_Banda · July 9, 2019, 3:01pm

Hello,

May I suggest having a mailing list, with calendar invites? I unfortunately keep missing these meetings because the forum reminders always get routed to my spam folder.

apotvien · July 9, 2019, 4:09pm

Hi @Juan_Banda,

Yes, absolutely. If anyone else wants the calendar invite, please message me, and I’ll add your e-mail to the list.

Christian_Reich · July 9, 2019, 6:27pm

Friends. Don’t! Hard for folks to get in when this is a private email list. Put the Webex information into the WG Wiki page. And @Juan_Banda: You know better than that. White list ohdsi.org.

apotvien · July 11, 2019, 6:00pm

I would like to dispel any perception that this WG is operating as a secret society.

The e-mail list is only for WebEx reminders for those who want them; it is not an invitation to attend the meeting. The meetings have always remained open to anyone who wants to join in, and the WebEx info has been hosted on the WG’s Wiki page since this group’s inception in January.

Andrew · August 7, 2019, 10:38am

Sorry, I think they may have fallen off my calendar also.

apotvien · August 9, 2019, 2:48pm

There will be no WG meetings on 8/13 or 8/27. However, on 8/27, I am looking forward to speaking on the community call to review the state of the library and to do a live walkthrough of the application prototype.

We can meet the week before on 8/20 to prepare for this call. In particular, I’d like to make sure that any discussions points WG members would like me to raise at the community call are met.

Thanks Team!

apotvien · August 20, 2019, 2:53pm

Good morning,

I’m pleased to release the OHDSI Gold Standard Phenotype Library Requirements Document.

This document represents most of what our WG has discussed throughout our meeting this year and captures the requirements we are envisioning for the library. I’m proud of the ideas and contributions from our WG members, and I’m excited for the library that is starting to take shape.

However, this document is not yet the “final draft”. We’re reaching out to the community for feedback! If the Gold Standard Phenotype Library will be relevant to your work in any way, please help by reviewing these requirements and letting us know what you think. Thank you!

SCYou · August 20, 2019, 11:14pm

I appreciate your tremendous achievement, @apotvien
I think it would be better if you mark each column in submission data whether it is required or not(or nullable).
Don’t we need a number of validated case? For example, I usually validated randomly sampled 100~200 cases for chart review. Should I store those information in ‘Data_Description’?

If I want to submit my own result, what should I do? Should I make a table in excel in the format of validation submission data?

Thank you again. Wonderful!