OHDSI Home | Forums | Wiki | Github

Using renv to handle R package dependencies


(Gregory Klebanov) #7

Friends.

There is another requirement that we need to consider. In many (too many) Healthcare, Pharma and Payer settings, any internet traffic from R environments is being blocked by IT security as it is considered to be a security threat. Downloading packages on a fly simply would not work. So, we need to ensure that all required packages are pre-packaged.

So, in my mind there are two things here:

  1. Packaging required dependent R libraries into studies for distribution
  2. A pre-built and tested execution environment

For #2, Odysseus have developed the ARACHNE Execution Engine. This component is already being distributed as a Docker image and packages all relevant OHDSI packages tested against the latest working versions of PLE and PLP and some other core libraries. In addition to that, the ARACNE EE is capable of creating a clean execution environment for each run. The ARACHNE Execution Engine can be installed together with ARACHNE Data Node that allows someone to submit R or SQL executions and receive results back. My hope that we will also standardized on how studies and results are packages. The ARACHNE EE can also be installed with ATLAS.

Should we consider this as a part of a solution here as well?


(Jose Posada) #8

Hi @gregk,

Some follow up questions

Can ARACHNE be self-contained to incorporate the study itself in the same docker?
How will be the end-user experience while developing and testing using ARACHNE?
How will someone use Rstudio or Pycharm with ARACHNE as part of developing the study package?

This last question is what @schuemie solution is best at because it is part of the usual workflow of developing.

Also, to criticize my own proposal, technical details to build a docker may escape the skill-set of people developing the study packages. That is the reason what I keep pointing to the DREAM challenge where there is a sandbox environment to test. The premise if that the package runs against the sandbox, it will run in everybody envs

Thoughts?


(Gregory Klebanov) #9

the idea is ARACHNE Dat Node/EE is an execution component.

My proposal was that this can be used by study creators to test packages before they are distributed. Then, on a data provider side, packages are uploaded, executed and results return back from ARACHNE Data Node.

Exactly, ARACHNE Data Node/EE is that “sandbox” or rather QC/Testing environment. It already exist and can be used as a great start

Rstudio or Pycharm is fine for code development, but you do not want to use Rstudio or Pycharm for testing - those are local IDEs. The testing should occur in an isolated, clean execution and self-contained environment outside of local workstations and even shared R servers.


(Chris Knoll) #10

Can we have the best of both?

Using @schuemie’s approach for initializing an environment can be paired with initializing a docker environment with the correct versions of the dependencies. For those that are unable to have direct connectivity to the internet, they pull the docker image over a secured connection for internal consumption. Arachne EE can do similar work to setup the execution context.


(Jose Posada) #11

Exactly. However while in RStudio or Pycharm you will do

library(foo) 
foo.someFunction() 

in your code. You would want to run that piece of code using exactly what ARACHNE has. That is why I have used in the past a Docker Container inside Pycharm while developing. In this way, I can import from the container itself and not rely on my local machine. The environment for the local machine and the environment for the true test must be the same to ensure everything runs fine.

How do we bring together those two worlds as @Chris_Knoll was suggesting? My idea is having a docker that should be your environment, I do not know if ARACHNE can fill that gap.

By the way, this is my longest thread in the forums :slight_smile:


(Gregory Klebanov) #12

I think the 3+ of us are saying the same. These are two lego blocks of the same solution - #1 is how you package dependencies and #2 is someone first tests and then executes in the same consistent environment. These are not contradictory - these are complementary.

For those who have experience with Java - this would not be new. Packaging your code, including dependencies. And testing and distributing for a specific version of JVM.

“What has been will be again, what has been done will be done again; there is nothing new under the sun.”


(Seng Chan You) #13

@jposada Previously, I made a docker image for OHDSI tool (CohortMethod / PatientLevelPrediciton version 3)


(Vojtech Huser) #14

Replying to Martijn initial post:

Can renv and lock files deal with installing just 64 versions of all packages. E.g., I typically only install 64bit version of R. If the lock file will try to use both, will it not fail (if the lock file has both 32bit and 64bit)


(Jose Posada) #15

Thank you @SCYou! This looks great. Did you ever push to ducker hub? I will certainly look at this


(Jose Posada) #16

@gregk you are on point here. In that line of thought it will be:

  1. A docker for having all dependencies in a single place
  2. ARACHNE as a way to distribute and execute the study packages

ARACHNE and the Docker will need to be in sync somehow right?
Am I summarizing correctly?


(Jose Posada) #17

Just one more piece in favor of Dockerization. NC3 is looking to do the same


(Martijn Schuemie) #18

Responding to some of the comments:

  • No, renv will not solve the problem of 32-bit R versus 64-bit Java. It will still be necessary to install the 64-bit version of R only.

  • We did test renv on MacOs and Linux, and it seemed to work fine (if you don’t use a 3-year old xCode install)

I’m all open to Docker, but would appreciate some instructions on how to get that running on my machine (I failed to get it to run on Windows in the past). Could anyone point me to instructions to install Docker on Windows and for example run @SCYou’s study?


(Anna Karenina) #19

Hello @schuemie,

Installing Docker on Windows has become much easier recently. Docs: https://docs.docker.com/docker-for-windows/install/

The old way was to install Docker Machine (Docker engine + Linux VM), which didn’t work for Win Home at all and sometimes required secret shaman rituals to succeed. With WSL2, Docker can be installed on Home edition as well.


(Chris Knoll) #20

I would like some help on this too (like @schuemie). I can get my mind around attaching to a docker image and getting it installed on windows (and can confirm that with a recent update of windows, they made bios settings to enable virtualization and other hurdles obsolete). What I can’t get my mind around is configuring the R environment for new projects or distributing a configured docker…do we host a docker repository? I don’t think we just check in an image to github (although I think I understand that there’s a dockerfile which provides the instructions on initializing the context), I apologize that my knowledge on the topic is a bit limited, but I am trying to understand if there is a redundancy between renv and docker or not… My vision of the workflow here is:

Develop your OHDSI study (outside of docker, just a normal R session environment).
Capture your dependnencies with Renv (or use @schuemie script to build the renv.lock file)
— development done
Get a clean docker for R
use renv to initlize the Renv
capture the docker state and make available for others to instantiate it.
— deployment done

Here’s where my gap in understanding with the docker is: do you bundle any of the dependent assets that get executed in the docker image with the docker image, or is docker just a series of comands executed via dockerfile and within that file you perform all the startup work to get the assets installed? I’m basing this question on what I see here.


(Jose Posada) #21

Hi @Chris_Knoll,

Below some answers to your questions:

We can host a docker repository here. Odysseus has been uploading already compiled images there. From there people will do a docker pull to get the image and run it.

The answer is yes. Almost everything if not all can be inside and you will run the study by executing a command-line argument. Here an example from AllenNlp that may give you an idea of how the dockers ar run for scientific packages.

The workflow you proposed is very close to what we should do, however, as anything goes we will need to test.


(Chris Knoll) #22

Thanks for the links, tho, I am not sure what I am seeing: none of the repositories I looked at (I viewed about 5) had any overview information on it; is odysseus/r-env different from the renv that @schuemie was referring to? there’s an r-java docker but isn’t r-java just an inner package of R, and not something you consume stand-alone? (do dockers ‘merge together’ to assemble a single r environment from a set of smaller dockers or is the primary use of dockers to produce a sort of ‘process image’ for specified functionality (such as a web server or J2EE WAR container)… Same with the AllenNlp, not sure what i’m looking at there.

But, I think the core of my 2 questions were answered: someone needs to host them (which we can use Docker Hub) and someone needs to load in the assets if we want a completely self-contained docker image.

So, I feel like the workflow I laid out above works for both contexts: the study designers and the study distributors: martijn doesn’t need to really know anything about docker in order to produce a new study implementation (less burden on the dev). He can load up all the necessary dependencies and then capture the versions of those dependencies via renv() or building a custom .lock file. For distribution, people have the choice of either initializing their environment using the .lock file, or pulling down an image from a docker repo (which someone must have initialized/published the image somehow, and the .lock file makes this very easy).


(Gowtham Rao) #23

image

renv is enabled in recent version of Rstudio wizard!.


(Martijn Schuemie) #24

Hi @Gowtham_Rao. Yes, it is integrated, but I see two issues with using the built-in functionality:

  1. By default, renv tries to infer what packages need to be included from the code in your RStudio project, rather than what is explicitly listed in the DESCRIPTION file of your study package. I found that this tends to include a lot of packages that aren’t needed for running the study, such as things I have my PackageMaintenance.R, like OhdsiRTools (with its many dependencies), pkgdown, and ROhdsiWebApi, as well as knitr, rmarkdown, etc. So the lock files becomes very ‘heavy’.

  2. renv doesn’t always get the installation details for the OHDSI R packages that come from GitHub correct.

I would therefore prefer to use the function I added to OhdsiRTools which solves these issues for you.


(Martijn Schuemie) #25

So the workflows would be:

When using ATLAS to design a study:

  1. Export study package from ATLAS. This would already have the appropriate lock file for all the dependencies. If we assume the study package name will be the same as the ohdsi-studies repo name, ATLAS / Hydra can already include the reference to that repo to install the study package itself.
  2. Post the study package on our ohdsi-studies GitHub.

When designing the study package in R:

  1. Develop the package.
  2. Run the new function in OhdsiRTools to generate the lock file.
  3. Post on ohdsi-studies.

From there the lock file can be used to reconstruct the R environment, for example in a Docker image.


(Konstantin Yaroshovets) #26

@schuemie

I can help with Docker and R environment.
Today we have already pre-build R-environment with all OHDSI packages available in Execution Engine and as separate image. We can use it as a base and extend it with “renv” or OhdsiRTools updates.

R-environment Docker image:
https://hub.docker.com/r/odysseusinc/r-env
Execution Engine Docker image (on top of R-environment): https://hub.docker.com/r/odysseusinc/execution_engine

You can find R environment build scripts here:

Nowadays Execution Engine could be used in combination with:

  • ATLAS
    This integration allows to create and execute Prediction and Estimation studies directly from ATLAS. But we are limited to these types of analysis here. Results of execution are available for download directly in ATLAS.

  • ARACHNE DataNode
    This integration allows to execute ANY type of analysis - this is suitable for ohdsi-studies repo. DataNode provides UI for datasource configuration and analysis execution for users who are not experienced with all technical details of R packages installation.

ATLAS screenshot:

DataNode screenshot:


t