OHDSI Data Use Agreement? Discussion on necessity and possible structure

Hi all-

Many collaboratives have data use agreement forms that are used when data is exchanged across institutions. These forms are useful for both the researcher conducting the study and also the data owner/holder.
My open question for the OHDSI community is: should research studies conducted within the OHDSI collaborative have data use agreement forms? This can be especially important when obtaining data from within the USA and also from international institutions where data use criteria and laws may differ.

My two main discussion questions for OHDSI are:

  1. necessity of a data use agreement (is it necessary, is it helpful, would it make you as a data owner more at ease with sharing data?, etc.)
  2. structure of a data use agreement

Input greatly appreciated!!!

I think this is a good idea (although there’s always the risk of too much paperwork getting in the way of real research). @Mary_Regina_Boland, would you happen to have an example data use agreement you can share?

One thing I observed in another data research network I’ve been involved in is that it is important that data only be used for the purpose for which it is provided. Especially since we tend to generate rather large data sets (take Achilles for example, but also the data shared for the treatment pathway study), these data could be used for many purposes. As a data holder, you should always have the last word on what happens to your data, even if it is no longer physically under your control.

I agree @schuemie, that data holders should have the right set the terms for how the data are to be used. However, I think we should also encourage as much open and unrestricted use of data as possible. Both within the community and outside (e.g. when we publish papers). There will, of course, be limits on how far this can go because of our institutional restrictions. All of this underscores the importance of data use agreements. It will bring everything out in the open so both data holders and analysts know what data can and cannot be used for.

I agree overall that paperwork should be minimized. However, a data use agreement is important for data holders to have when writing their IRB (i.e., are they agreeing to release of the data for 1 study or multiple studies, and how long will the data be stored?, who in the future will handle the data if researcher leaves current institution?, etc.).

I think this is important because theoretically data could be obtained for a study by a researcher and then 5 years later they could publish another paper using the data. I am nervous about what could happen if the data holders or generators of the data disagree with how the data are used later on. Also depending on the IRB there could be stipulations about how the data is handled, etc. I believe its better to be upfront about these issues.

If we have a formalized ‘Data Use Agreement’ as a community then researchers could modify it somewhat along with IRB Protocols, etc. But in that way, everyone knows the restrictions on the data (or the lack thereof) a priori. Long-term this will prevent a lot of problems.
Also many large consortium have such agreements at the outset of joining / obtaining data:

SEER-Medicare has a proposal instruction form and a data use agreement form: http://healthcaredelivery.cancer.gov/seermedicare/obtain/seerdua.docx
that perhaps we could learn from as a community. I don’t think we need to be as restrictive. Comments would be very helpful!

There are actually data use agreement forms in other domains, e.g., NASA on their satellite data

This is actually a great document on data use in non-health PHI domains.

Yes I agree, for the governance boards I have expierence with in the EUADR network but also in the EMIF project this is an important issue. Good to review might be the ENCEPP Code of Conduct: http://www.encepp.eu/code_of_conduct/index.shtml

Most studies done in these networks are reallty focussed studies based a limited set of data elements. However, we want to move to data driven, hypothesis generation studies. It will be a challenge to convince the boards to accept that kind of usage of the data.

My experience is that sharing of data into a protected remote research environment is accepted much easier than transfereing files by email like in the treatment pathway study. You want to have control over the data.