The process for proposing and defining a network study

schuemie · June 19, 2018, 7:39am

In OHDSI, everyone can propose and lead a network study. Lately there have been some changes to how that would work. In the past, you’d start by creating a page in the Research Studies Wiki, and proceed from there, and this is still how the process is documented. Now, it seems that the Wiki page will soon be retired, leaving many people confused on how to move ahead with their studies.

There are some other issues with the current process as well: the Github repo called ‘StudyProtocols’ often doesn’t contain the study protocol, but does contain the R package for executing the study. There’s a StudyProtocolSandbox repo and a StudyProtocols repo, and how the two are related is often unclear.

If I may, I’d like to propose the following process:

(Optional) Announce your interest in doing a study in the Researchers section of the forums. This would be a way to help find collaborators.
Create a study repository in your own Github space. For example, I’ve created an example study here. The most important things to start with is a README.md file, which displays when someone opens the repo, and can contain links for example to a Google docs containing the protocol under development. You can create the study repo from scratch, or by forking an existing repo, for example the Comparative Effectiveness Study Package Skeleton.
Once your study package is complete (and has been tested at at least two sites), we can import it as a Github Submodule into a repo of ‘vetted’ network studies. We could call this repo ‘NetworkStudies’ instead of the current ‘StudyProtocols’ to be more precise. The advantage of using the Submodule feature is that the complete history of the study development is preserved. The advantage of corralling all these studies in a single repo is that there’s a single place to find them, and you have a nice URL to point to in the methods section of your paper (which is best practice). To see how a Github Submodule would work, check out how @msuchard has imported the SkeletonComparativeEffectStudy repo into StudyProtocolSandbox.
The study can then also be advertised on the OHDSI website.

What to people think?

Looping in @Christian_Reich, @MauraBeaton, and @Rijnbeek.

Rijnbeek · June 19, 2018, 8:27am

Hi Martijn,

I agree we can improve on this. I think it is important we keep everything together in one place (not called studyprotcolos indeed), i.e. study protocol and full open source analytical pipeline. I would propose to put the final doc in the repo itself and not keep it in a google doc though.

I personally never liked to the wiki page solution so much since it is very “far” away and the network studies (our ultimate goal) deserves a much more prominent place and needs to be promoted on ohdsi.org itself (with a link to the github). When I go to ohdsi.org i like to be immediately pointed to all the great research that is ongoing or has been done in the past.

I had a look at the submodule solution that Marc proposed and I think it is a great solution (see also https://git-scm.com/book/en/v2/Git-Tools-Submodules). We need to help the less experienced people that like to start a network study a bit maybe but that will work just fine.

So, yes i fully support this.

Peter

schuemie · December 10, 2018, 3:18pm

Continuing this conversation, and looping in @Chris_Knoll:

As the StudyProtocolSandbox repo is closing in on 1,000 commits, with 62 subfolders worth over 600MB, and already one force push incident, perhaps we should get serious about an alternative way to share our study packages.

As described in the post above, one solution is to create a new repo, for example called ‘OhdsiStudies’, and use Github Submodules. The idea is to create a repo in your own Github space, like my method evaluation study package here. Once the developer feels like sharing the study with the OHDSI community, the repo can be linked as a submodule to OhdsiStudies.This will effectively add a snapshot of the current state of the study repo to the OhdsiStudies repo. If the study package is updated, the submodule link needs to be updated as well.

Or we could just leave the study package in the personal Github space. You could perhaps advertise your study in a central repository, for example by posting a README that links to your personal Github space.

Let me know what you think.

Chris_Knoll · December 10, 2018, 4:20pm

Thanks for looping me in, @schuemie. I’m sorry I missed this thread the first time.

Submodules looks good. it does add a bit of complexity, however if we follow a model where everyone manages their own study repository directly, and people who want the latest code for a study use the git submodule update --remote {studyName} to pull latest changes for a study, that will work.

The only commits that will exist on the ‘parent’ repo will be those that add the new studies (ie: modules) into the repo.

I do have a network study that is getting polished up for sharing with the community, would it be ok if i added the study to StudyProtocols as a module (and also via pull request) which will only commit the .gitmodule file tot he repo, and you won’t see any commits related to the sub-module (the study) if we make any changes.

Would that be acceptable?

schuemie · December 10, 2018, 4:36pm

Yes, sure. I did also want to take this opportunity to pick a different name. ‘StudyProtocols’ is incorrect, since its both the protocols and the code.

Any preferences? Here are some ideas:

Studies
OhdsiStudies
StudyPackages
NetworkStudies

Chris_Knoll · December 10, 2018, 5:06pm

No preference, OhdsiStudies or NetworkStudies is nice because it’s at least a little more specific than ‘Studies’. But your call.

When you say ‘pick a differnet name’, i was asking about adding to the existing repos, did you want to start a fresh repo for this activity?

SCYou · December 11, 2018, 1:55am

I agree with your opinion, @schuemie. I almost ruined the studyprotocolsandbox by force push.

I prefer OhdsiStudies for the name.

schuemie · December 11, 2018, 8:38am

Ok, let’s give this a try.

I’ve created the OhdsiStudies repository, and have added some instructions to the README on how to add (and update) submodules. I have granted write access to the developers team, which includes the folks on this post.

@Chris_Knoll: Whenever you’re ready, could you try and use this to share the study package you’re currently working on?

(Note: I’m not touching the StudyProtocols repository, since already several papers have been published with references to studies in that repo).

Christian_Reich · December 11, 2018, 12:11pm

@schuemie et al.:

Totally support the idea we need to figure this out. But we may want to step back for a moment. You guys immediately jumped to a technical solution that will work for the very few folks (essentially on this Forum debate minus @Christian_Reich) that are capable and willing to create study packages in R. But we need to have a repository of all studies on the OHDSI network:

Feasibility counts written in SQL.
Other counts written in SQL (e.g. @Vojtech_Huser’s lab unit and @Dymshyts’s obsolete NDC queries)
Feasibility counts designed as cohorts in ATLAS
Cohort characterizations designed in ATLAS
PLP and PLE studies designed in ATLAS
Any of 2-4 designed in a different tool but compliant with the JSON format (we don’t care about non-compliant proprietary solutions).
Stuff written in something else but SQL and R (Python comes to mind)

So, I think the components we need to build are:

Repository with all studies: some kind of ID, title and initiator (owner, investigator), some stats (#of participanting databases for example), link to detail
Detail page for each:

Abstract (intent, aim, purpose)
Status (ready, in dev)
Link to protocol
Link to code and instructions (Github sounds like a good place)
Participants, their databases and results

ARACHNE was supposed to be doing this. I keep having debates with @gregk about the right approach of how to get there. In my opinion, we should have a manual way (discussed here), and then @gregk and his team should “pave the cow paths” so to speak. It should be easy and straightforward to do this manually, but we will need automation if we want to scale in future.

Thoughts?

schuemie · December 11, 2018, 1:00pm

Just a fun history lesson: Here’s the first ever shared OHDSI study: some SQL files to execute the treatment pathway study. Here’s the commit two days later that wrapped the SQL files in an R package because the SQL scripts were unworkable in a network with many different database platforms, and where each SQL file produces an output that needs to be saved to a CSV file.

I’m not saying we couldn’t have different solutions to study code sharing, just that the current de facto standard in OHDSI is R packages. With the new features in ATLAS for generating PLP and PLE study packages, many more people should be able to create those. Note that you can share SQL, Python, and JSON via OhdsiStudies as well, but I think you’ll find fewer sites willing to run those, and you’ll often run into the same issues as the treatment pathway study.

We were sharing the study code via StudyProtocols and StudyProtocolSandbox, but those became unwieldy, so we now have a new solution in OhdsiStudies using submodules. Maybe we’ll run into different problems with this new approach. I think this is just a next step in the evolution.

As for keeping tabs on ongoing studies, I thought we had a really nice solution through our Wiki, but since then this has been moved to the OHDSI website. That should cover your points 1 and 2.

Patrick_Ryan · December 11, 2018, 1:01pm

Thanks @Christian_Reich, definitely agree. We need to be discussing more broadly the capabilities to collaborate for network studies, and I am hopeful that evolves into an approach that involves a shared solution, like ARACHNE, that everyone can work with.

schuemie · December 11, 2018, 1:40pm

Just to be clear: I would love a solution where in ATLAS or wherever there would be a button next to my cohort characterization or other study design saying ‘run in OHDSI network’, which would then share it with everyone who would get a pop-up ‘would you like to run Martijn’s study?’ where they can click ‘yes’ to execute the study, send results back, and I could collate the results from all the sites. My only concerns with that are

It doesn’t exist yet
It would cover about 80% of use cases (those that conform to some template)
My main research interest is the other 20%

SCYou · December 11, 2018, 1:46pm

@Christian_Reich yes, I agree with your opinion. That should be our future.
Still, @schuemie said, we haven’t had perfect platform yet. I hope we have some day as you say, then we can open the next chapter of OHDSI as the scalable research network across the world.

Vojtech_Huser · December 11, 2018, 6:43pm

some responses and comments:

Do I email Maura to get my ThemisMeasurement study added to https://www.ohdsi.org/network-research-studies ?

I also think that Arachne should be the version 2. We all have login to go there anyway thanks to Athena download site. In Arachne, people can update metadata about their local dataset. Github will not provide that. To look competitive, we need OHDSI PopMedNet. (a.k.a. Arachne). It already has tottaly private dataset option built in.

Chris_Knoll · December 12, 2018, 12:37am

I’ll add the reference to the study as a sub-module, @schuemie, and let you know how it goes. If there’s any trouble following the instructions, I’ll let you know.

Thanks for putting this together.

-Chris

Christian_Reich · December 12, 2018, 4:14am

Sorry guys, I am going to be a bad cop, here:

I’d assert the de-facto OHDSI standard is ATLAS. We have 1,769,736 cohorts in the public version, and an unknown number in the dozens of ATLAS installations behind firewalls. This is a phenomenal success. All of these cohorts were built to do at least a feasibility study, if not a cohort characterization. Some where used in PLPs and PLEs.

I know you have been taking R to an extreme end, beyond just computational methods. Nothing wrong with that. We are getting a lot of traction out of it. And we are proud we have all that.

It will work technically, but not socially. The overwhelming majority of the community will not be able to do things like that.

Again, I am sorry, but I don’t understand why replacing one Github solution with a slightly different one will solve any problem.

We need a solution that works for the community. As I said before, different levels of complexity, different technology choices. This solution, in my mind, needs to solve the following problems:

Repository of studies. That should be dynamic. A webpage or Wiki ain’t working, because it inevitably becomes stale (like the ones you linked).
Execution engine for making things work in different technology stacks, including error handing and package processing behind firewalls.
Management of databases.
Secure network processing, including governance.
Development of trusted entities that can be agreed upon and exchanged. Running some R spaghetti code coming from the outside is a no-no for most organizations.

That’s exactly where I would want things to go. And have it organized in an Open way, so other applications can also utilize the same functionality. If I am not mistaken the team is working on it.

It’s called ARACHNE. I am not a friend of the current UI, and Greg gets an earful from me all the time, and we need to make it more of a community effort to define the functionality and drive it forward.

That sounds like a good thing to me.

And we need to cover these as well.

schuemie · December 12, 2018, 4:41am

@Christian_Reich, I don’t think we’re actually disagreeing on anything. Those are all nice features to have, but we’re just trying to get work done while we think about how good things could be.

It sounds like Arachne is the solution to everything. Great! I’m not sure my organization has Arachne running. I’m looking to see if I can install it myself, but I can’t find any documentation on how to install it on the Wiki or any of the 6 Arachne GitHub repo’s. Is there a public Arachne instance I can try?

SCYou · December 12, 2018, 4:56am

Oh dear, you’re not a bad cop, here.

I’ve been helped Korean researchers to learn and use OHDSI, OHDSI tools, and how to build and execute OHDSI study. Still, none of all except me is building the whole study protocol by using ATLAS or launch the network study… Of course, mostly it’s on my laziness… But I am feeling exhausted now…

Again, I am totally agree with your opinion, @Christian_Reich . I really hope that OHDSI will have those kinds of solutions.

Currently, however, I believe that @schuemie 's proposal is a practical alternative for network study, for now.

Vojtech_Huser · December 13, 2018, 10:02pm

I think the public instance would be here: https://www.arachnenetwork.com

at least this is where I created the ThemisMeas study.
The login credentials are the same as for the athena vocab download site.

gregk · December 13, 2018, 10:43pm

@schuemie, @Vojtech_Huser, @Christian_Reich

hey guys - we are re-building the OHDSI ARACHNE environment as we speak. Please give it a few more days and it will be accessible again at http://arachne.ohdsi.org. For us to really try ARACHNE at OHDSI - we need to connect some data nodes, outside of the usual SynPUF suspect (it is almost useless to try to prove the network idea by running example code on single node with SynPUF). This is something that we had also discussed and the current plan is to at least list different available organizations and related data sets across the network.

Guys - for us to create a successful network solution that work, let’s work together to share ideas on what works and what does not and what it should be. While the existing ARACHNE works and there are tons of useful (and some maybe not so ) features in there, but I have no doubt that it can be improved and made better. As a first step, maybe we pick an example study and test drive it through this platform all together?

@schuemie - btw, Janssen does have an internal ARACHNE POC instance that you could also use for testing.