Implementing the FAIR principles in the OHDSI approach and tools

keesvanbochove · April 7, 2020, 4:22pm

Dear OHDSI friends,

This is perhaps a strange time to start a discussion on this topic of the FAIR principles, with much of our urgent attention focused on dealing with the COVID-19 crisis and providing reliable medical evidence for that. However, it is also very important that our approach and the evidence we generate finds its way in the hands of people around the world that need it - medical doctors, researchers, regulatory agencies, citizens and patients. To realize that, we need communication (and people like @CraigSachson, @MauraBeaton and many others are working tirelessly on that), but there also some technical aspects to improving the Findability, Accessibility, Interoperability and Reusability of OHDSI artefacts (such as protocols, databases, study results, vocabularies, software libraries - any digital resources could be in scope of FAIR). A very preliminary discussion can be found in the Book of OHDSI.

The Hyve has a task in the EHDEN project to work on this, and our original plan was to use the OHDSI Europe symposium to gather some feedback from the community on where we should focus our efforts in this respect, and then based on that work with any of you interested to see where we can gradually improve the FAIRness and which other standards and initiatives we should align ourself to. However, things have changed due to the COVID-19 pandemic and that’s why we are now publishing the poster and opening this forum topic to gather any feedback on what you think we should be focusing on.

So, to make it simple, if you could take a moment to score how important it is to improve the FAIRness of these large buckets of digital resources from 1 to 5 (1 = don’t waste your time, 2 = not important, 3 = neutral, 4 = important, 5 = critical), that would greatly help us! Of course any feedback on this topic is welcome.

EDIT: looks like the numerical poll doesn’t work, so going for a multiple choice right now: please mark the 1 or 2 most important items we should focus on.

Studies (e.g. study protocols, study results, study publications, study authors etc.)
Databases (database metadata including type, domains included, number of patient years and followup, inclusion/exclusion triggers etc., database snapshot versions, database reports for example Achilles, DQ))
Data model (CDM versions and definition including domains, fields, constraints, and the vocabularies and vocabulary versions)
Software (analysis packages, visualization tools, ETL tools, ATLAS etc.)
Discourse (protocol discussions, CDM choices, forum posts, WG materials from wiki, papers etc.)

0 voters

CRoeder · May 4, 2020, 3:54pm

Kees,

I think they all rate highly, but because it can come up early in a project, I’ve worked on the data model build. It took some digging around to find the DDL, where it lives, how it gets created and how to assemble it. I’d be happy to share my experience and my work so-far.

-Chris