OHDSI Home | Forums | Wiki | Github

Requirements Development for the OHDSI Gold Standard Phenotype Library

(Christian Reich) #81

What about:

  • New intervention (could be drug, device, procedure)
  • First intervention
  • Incident intervention
  • Prevalent intervention
  • New onset of condition
  • First onset of condition
  • Incident onset of condition
  • Prevalent condition

The Developmet_Methodology also needs categories in my opinion. Otherwise people indeed will write such self-promoting sentences like “This phenotype was developed by a group of 3 expert endicrinologists…”

Sorry to come in late here. Do you have a controlled vocabulary for the other ones as well? Like Modality, Provenance_Reason? What’s Provenance?

Also, “Uses_Labs” should be “Uses_Measurements”. We should use standard OMOP Domains.

(Aaron Potvien) #82

Hi @Christian_Reich,

Thank you for the suggestions and feedback… especially for providing it free of self-promoting sentences. :wink:

With respect to Modality, we were alluding to differentiating between rule-based (heuristic) phenotypes versus computable (algorithmic) phenotypes.

Provenance is intended to capture how phenotype definitions evolve over time. A simple example is versioning. Supposing we have a phenotype with versions 1, 2, and 3, we would have the provenance capture that version 3 came from version 2, and version 2 came from version 1. This doesn’t have to be one-to-one though. By having each phenotype identify its ancestor(s), we could navigate a graph to show how a given phenotype developed.

This has implications when it comes to validation. With each change in version comes a potential change in the algorithm’s performance. Accordingly, we’ve decided to anchor the validation sets to whatever version the validation refers to. The validation sets do not “carry forward” to protect the user from automatically assuming, for instance, that if version 2 performed well, then version 3 must perform the same.

Now, it could be that that’s true if the change in version was minor, but what constitutes such a “minor change” is difficult if not impossible to establish in a general framework and would likely have to be considered on a case by case basis. However, all of the information would be available to the user to do so.

(Christian Reich) #83

Don’t try me!!!

Got the Modality, got the Provenance (even though if you mean version you may just call it Version, and then like in Wikipedia introduce “predecessors”. You know more about it, but I am not sure if this kind of pedigree is really clean. I think people just open a phenotype and then start futzing around. And before they know it they created something else, without really caring for the evolutionary path. I may be wrong, though.)

But what is Provenance Reason?

(Aaron Potvien) #84

Yes, I think “Version” is an example of “Provenance”, but “Provenance” doesn’t necessarily always mean “Version”. A phenotype might be directly derived from another, or it might borrow concepts from another, or it might just be “inspired by” another (like a “See Also” situation). I invite others with clinical experience to give other examples of the types of Provenance that could be documented. In our current line of thinking, it very much aligns with the idea of a “Predecessor” in the sense of being connected via a directed graph.

That can be one motivating point for the library’s existence. If a phenotype is truly created with the “gold standard” practices, then at minimum, the author will have to fill out these elements causing them to consider and document what it is they are creating and why. The job of the librarians would be to verify these elements are documented before accepting its addition into the library.

The Provenance Reason and Provenance Hash were intended to act as parallel arrays. In the picture, the first hash identifies the phenotype, and the first reason corresponds to the provenance concept. It’s similar with the second hash and second definition. Admittedly, there’s probably a more “JSONic” way to document them as being paired together in a single object, so that’s certainly subject to change. That’s related to another point brought up at the last meeting, which is how these JSONs will come to be. We’re still working on that, but one idea is to have a form that can be filled out which automatically creates this object in the necessary format.

(Aaron Potvien) #85

Hello everyone,

I’m looking forward to continuing our discussion tomorrow. I’ll share a brief update about the current state of visualizing provenance in the viewer application.

By having each entry track its descendents, it’s possible to construct a graph that represents where the currently selected phenotype falls within the context of its full evolutionary path. More specifically, it’s possible to affix a cluster ID to each entry based on the connected components of the graph (all ancestors/descendents that ever had a connection to the phenotype, directly, or indirectly) and plot that cluster.

This provides for a rather interesting opportunity to convey a lot of information visually when plotting the graph cluster. For instance, nodes/edges can have shapes/colors/sizes taken to mean different things. I’m hoping the group can help come up with ideas about how to best structure this and comment on what features would be useful.

As is always the case, if others have agenda items they would like to see for this upcoming meeting or any of our future meetings, please don’t hesitate to share! Thank you very much!

Link to tomorrow’s meeting below (10-11am ET):