Our WG hasn’t met in a while. Here are some relevant developments in this space and related spaces we should keep abreast of along with a couple of suggestions we might ponder until Shawn resumes calls. Hopefully that will be soon!
The NIH Collaboratory is promoting clinical trial data sharing to meet evolving journal and research funding requirements and increase the value of trial data through reuse. Our work might consider reaching out to them to help to meet the needs they are seeking to address.
They are educating researchers and research organizations about data sharing requirements, data sharing plans, and demoing solutions that help meet related needs. The NIH data commons is a prominent example of research sponsor aspirations to leverage research data in ways that foster reuse. My sense is that after years of faltering attempts to promote data sharing there is now enough momentum, technical maturity, and calibrated understanding of researchers incentives to tip the scales toward widespread adoption over the next couple of years if things go well.
An STDM-to-OMOP ETLing solution produced by the OHDSI CT WG that allows semantic interoperability of shared trial data and use of OHDSI tooling on data aggregated across trials would be an enormous benefit to this effort. It would also get more researchers involved in using OMOP and OHDSI tools. At least some of those researchers would then be likely to join OHDSI more fully and expand work in this space.
One solution featured in the NIH Collaboratory’s Sept 27. 2019 videoconference on trial data sharing was Vivli. This and similar platforms enable researchers to share data, search for shared data, assign credit to researchers and organizations who share their data, track users of shared data, and track publications that use shared data.
Vivli uses DataCite and DOIs to assign credit and track data use. DOIs are Digital Object Identifiers. A DOI is an alphanumeric string assigned to uniquely identify an object. It is tied to a metadata description of the object and to a URL for accessing important details about the object. DOIs can be generated for data sets, software, images, and other research materials.
Though this effort is focused on trial data, these sharing, tracking, and reuse issues will be relevant to any data commons type use of the ETL and will increasingly pertain to data collected in routine care that are the bread and butter of OHDSI research. They are worth bringing to this groups attention, in part, because data from routine care will increasingly be integrated with trial data.
Here’s why I think that’s true and why we should be aware of the implications for our WG. In 2015, the FDA sponsored pilot projects examining integration of data capture for trials into routine care through EHRs. Last year they issued guidance based on what was learned from those pilots. Their industry guidance covered both the use of electronic source data in general and EHR data in particular in clinical investigations, i.e. in trials.
Health care organizations are adopting solutions offered by the major EHR and CTMS vendors’ (e.g. Cerner’s PowerTrials, Epic’s BEACON, AllScripts/Veradigm’s partnership with Microsoft, Velos/REDCap integration with EHRs, etc.) that is responsive to the FDA guidance. These solutions will integrate electronic case report forms into the EHR, allow clinicians to see research-related data in the patient’s chart, screen and recruit and conduct other research workflows using the same systems that support routine care.
As health care organizations adopt these solutions it will allow trial data to be collected and linked to records of the usual care received by the same patient. This will open up important new possibilities for analyses that more directly evaluate and compare data and evidence generated from trials and from routine care. Such analyses will help advance the amazing work presented by Patrick Ryan, George Hripcsak at this year’s OHDSI Symposium on trial and observational evidence for regulatory decisions.
Apart from trial data, open observational data repositories such as the MIMIC-CXR database used by PhysioNet are a fantastic resource that OHDSI should seek to promote and use. Obviously, these efforts are only relevant to the CT WG insofar as the group’s work is designed with respect to the needs of a data sharing solution that encompasses both trial and observational data.
Since the CT work group’s work is being driven in part by Odysseus and the Hyve, it seems relevant to explore whether that should be a design consideration. The OHDSI community would benefit greatly from a strategy for the kind of functions supported by Vivli. I don’t know whether it is the best platform for this sort of thing or if there are better alternatives or if it would be better for OHDSI to develop its own. But I think that there would be much wider study participation and more studies would get done if we were to support or participate in a centralized data sharing platform and incentivize its use through a well-designed credit attribution system. Supporting this for both observational data sets as well as for OMOPed data from trials would be ideal.
I feel especially strongly that we would benefit from a credit attribution system that uses a global persistent unique identifier like a DOI to give credit to community members. The OHDSI community is doing amazing things without many people getting much credit for their contributions. Think of what would happen if their contributions became matters of public record that they could put on CVs, show to superiors, etc.
This isn’t a central issue the CT WG is trying to address either, but it too is related enough to warrant mentioning here. Efforts to implement the ETL from trial data set (STDM) to OMOP are likely to be motivated by some effort to promote FAIR principles and hence will require a good credit attribution system. Attribution for the data should be done in concert with attribution for other research products since they are often handled using the same attribution strategy.
In addition to DataCite and ORCID here are some other relevant efforts in this space worth considering if we want to include a robust strategy for using trail data set identifiers and trial data set contributors in our work:
National Center for Data to Health (CD2H) is a consortium of academic health centers in the US. Its efforts in this space include a promising solution called InvenioRDM. It is being developed:
in partnership with the European Organization for Nuclear Research (CERN), birthplace of the World Wide Web and developers of the Zenodo RDM for the European community
This work is:
driven by user needs and informed by best practices and standards, including those that help define Next Generation Repositories as a foundation for a distributed, globally-networked infrastructure for scholarly communication, discovery, and innovation.
… to build a wide range of features that can help power biomedical research and support data sharing, innovation, knowledge dissemination, and interdisciplinary collaboration.
Another CD2H effort is working on
development of a contribution role ontology (built on CRedIT through the CRedIT ontology) to support modeling of the significant ways in which the translational workforce contributes to research; better understanding of the types of research objects generated; and mining of acknowledgements section of publications to harvest existing contributor roles to serve as a data source to drive additional development
I know it’s a pain reading long posts. I packed a lot into this one because efforts like the NIH Collaboratory, CD2H, PhysioNet, and MCBK are developing robust methods and infrastructure to support open science. There are so many challenges to doing this kind of work well, I think this WG and many others in OHDSI should be aware of what’s relevant in these other networks so we can decide what we might want to contribute to and benefit from.
Here’s my bottom line suggestion for the CT WG: consider defining our motivating use case through engaging with the NIH Collaboratory or with Vivli or another widely used platform. A successful product of that engagement would ensure that the broader research community benefits from the CT WG’s solutions for munging STDM data to OMOP so they can enjoy the meticulous effort OHDSI has given to the CDM and the inspired OHDSI tools. It might also provide funding if they are sufficiently interested. Engagement with them might simply consist of repeating the prior demonstrations by Odysseus & Medidata and the Hyve.
If Odysseus or the Hyve or another group plans to develop an OHDSI data sharing platform we might just investigate what’s being done by others in this space to learn what’s needed for more OHDSI-specific purposes.