Using renv to handle R package dependencies

jposada · May 22, 2020, 3:12pm

Below some answers to your questions:

We can host a docker repository here. Odysseus has been uploading already compiled images there. From there people will do a docker pull to get the image and run it.

The answer is yes. Almost everything if not all can be inside and you will run the study by executing a command-line argument. Here an example from AllenNlp that may give you an idea of how the dockers ar run for scientific packages.

The workflow you proposed is very close to what we should do, however, as anything goes we will need to test.

Chris_Knoll · May 22, 2020, 7:52pm

Thanks for the links, tho, I am not sure what I am seeing: none of the repositories I looked at (I viewed about 5) had any overview information on it; is odysseus/r-env different from the renv that @schuemie was referring to? there’s an r-java docker but isn’t r-java just an inner package of R, and not something you consume stand-alone? (do dockers ‘merge together’ to assemble a single r environment from a set of smaller dockers or is the primary use of dockers to produce a sort of ‘process image’ for specified functionality (such as a web server or J2EE WAR container)… Same with the AllenNlp, not sure what i’m looking at there.

But, I think the core of my 2 questions were answered: someone needs to host them (which we can use Docker Hub) and someone needs to load in the assets if we want a completely self-contained docker image.

So, I feel like the workflow I laid out above works for both contexts: the study designers and the study distributors: martijn doesn’t need to really know anything about docker in order to produce a new study implementation (less burden on the dev). He can load up all the necessary dependencies and then capture the versions of those dependencies via renv() or building a custom .lock file. For distribution, people have the choice of either initializing their environment using the .lock file, or pulling down an image from a docker repo (which someone must have initialized/published the image somehow, and the .lock file makes this very easy).

Gowtham_Rao · May 23, 2020, 11:36am

renv is enabled in recent version of Rstudio wizard!.

schuemie · May 25, 2020, 9:11am

Hi @Gowtham_Rao. Yes, it is integrated, but I see two issues with using the built-in functionality:

By default, renv tries to infer what packages need to be included from the code in your RStudio project, rather than what is explicitly listed in the DESCRIPTION file of your study package. I found that this tends to include a lot of packages that aren’t needed for running the study, such as things I have my PackageMaintenance.R, like OhdsiRTools (with its many dependencies), pkgdown, and ROhdsiWebApi, as well as knitr, rmarkdown, etc. So the lock files becomes very ‘heavy’.
renv doesn’t always get the installation details for the OHDSI R packages that come from GitHub correct.

I would therefore prefer to use the function I added to OhdsiRTools which solves these issues for you.

schuemie · May 25, 2020, 9:22am

So the workflows would be:

When using ATLAS to design a study:

Export study package from ATLAS. This would already have the appropriate lock file for all the dependencies. If we assume the study package name will be the same as the ohdsi-studies repo name, ATLAS / Hydra can already include the reference to that repo to install the study package itself.
Post the study package on our ohdsi-studies GitHub.

When designing the study package in R:

Develop the package.
Run the new function in OhdsiRTools to generate the lock file.
Post on ohdsi-studies.

From there the lock file can be used to reconstruct the R environment, for example in a Docker image.

Konstantin_Yaroshove · May 25, 2020, 6:06pm

@schuemie

I can help with Docker and R environment.
Today we have already pre-build R-environment with all OHDSI packages available in Execution Engine and as separate image. We can use it as a base and extend it with “renv” or OhdsiRTools updates.

R-environment Docker image:
https://hub.docker.com/r/odysseusinc/r-env
Execution Engine Docker image (on top of R-environment): https://hub.docker.com/r/odysseusinc/execution_engine

You can find R environment build scripts here:
https://github.com/OHDSI/ArachneExecutionEngine/tree/develop/src/main/dist

Nowadays Execution Engine could be used in combination with:

ATLAS
This integration allows to create and execute Prediction and Estimation studies directly from ATLAS. But we are limited to these types of analysis here. Results of execution are available for download directly in ATLAS.
ARACHNE DataNode
This integration allows to execute ANY type of analysis - this is suitable for ohdsi-studies repo. DataNode provides UI for datasource configuration and analysis execution for users who are not experienced with all technical details of R packages installation.

ATLAS screenshot:

DataNode screenshot:

Chris_Knoll · June 8, 2020, 1:36pm

I said that the docker ‘managers’ would grab the dependencies via .lock file and package up a context for use by people that use docker. I was trying to avoid forcing people to engage in another technology (docker) when it’s not necessary:

I was trying to say here that the work between ‘development done’ and ‘deployment done’ doesn’t need to be done by the study author. The author can focus on the tools to build the study, and the deployer can focus on the tools that deploy.

I’m pretty sure we don’t want to only deploy docker images for people to execute: people should be able to get the study and run it directly in their own R session. Am I wrong about that assertion?

Adam_Black · June 8, 2020, 2:04pm

Great discussion. I want to help with this! I agree with @Chris_Knoll that we can use both renv and docker for dependency management, testing, and study execution. However, I think the need to use OHDSI tools and run studies from a local R install or R server will remain even in the presence of a docker based OHDSI study distribution system.

A couple comments in response to @schuemie.

Using renv seems like a great idea particularly if different studies require different OHDSI R package versions.

You can create a .renvignore file (with entries of the same format as a standard .gitignore file) to tell renv which files to ignore within a directory. ref

I’m not sure what the issue is exactly but it might not be a problem with renv. It might be a problem with using install_github as the means of package distribution. See Why could install_github be wrong? from the drat FAQ.

The renv package provides a simple and clean interface for saving and restoring the state of an R project:

renv::init()
renv::snapshot()
renv::restore()

The problem with using a custom function to create the renv lock file is that we are maintaining our own interface to functionality provided by renv. If renv changes the format of the lock file the above interface will still work but the custom snapshot function (createRenvLockFile) might no longer work. If issue #2 truly is a problem with renv then it seems preferable for the issue to be fixed in the renv package unless it is an OHDSI specific problem. In this case it doesn’t seem like a big deal but we should think critically about interfaces between the OHDSI tools and dependencies.

jposada · June 8, 2020, 9:50pm

Hi @Adam_Black,

Great points! Have you ever used this?

https://ropenscilabs.github.io/r-docker-tutorial/

or this

https://www.rocker-project.org/

Adam_Black · June 8, 2020, 10:30pm

I’ve heard of rocker but but not the ropenscilabs tutorial. I’m very much a docker novice but definitely see the value for reproducible research. I’m still getting up to speed with Odysseus’ docker repo and Arachne. I imagine if we added RStudio Server to the docker execution_engine it might make a nice reproducible development and execution environment for R programmers. I’ll look at those links. Thanks!

schuemie · June 9, 2020, 4:47am

@Adam_Black: yes, this brings us to another topic that needs discussing: install_github vs. drat.

We had intended for drat to be our package-deployment mechanism. And we do in fact add an entry to drat for every package release. But in practice we always use install_github, for the following reasons:

You can only install from drat that which is in drat. This currently precludes study packages. We can think of adding those to drat as well, but that would require handling of push rights to the drat repo, and basically mean a lot of management overhead.
Installing specific versions of packages from drat is pretty awful. Using install_github we can type install_github("ohdsi/DatabaseConnector", ref = "v2.2.0"), using drat we need to type install.packages("https://github.com/OHDSI/drat/raw/gh-pages/src/contrib/DatabaseConnector_2.2.0.tar.gz", repos = NULL). I can easily remember the first command. I always need to look up how to do the second (and then still make mistakes in the URL).
install_github and drat do not work together. So if my study package is not in drat, I’m stuck with install_github.

schuemie · June 9, 2020, 6:21am

Just realized another problem with drat:

I don’t think you can use drat in renv

@Adam_Black: the issue with renv and our GitHub packages is that renv is unaware of our conventions of tagging versions in GitHub. So if for example today I install an OHDSI package using install_github("ohdsi/SqlRender"), that will install the current version:1.6.6. renv does not know that the way to reproduce that in the future is to use install_github("ohdsi/SqlRender", ref="v1.6.6"), but my function does.

keesvanbochove · September 20, 2021, 9:00pm

Just replying on the Docker topic in this thread: for the Pioneer study package, I’m experimenting with using Docker to distribute the study binaries (as you suggested last week @jposada). I’ve discussed it today with @lee_evans and @Adam_Black and committed the code directly to the repository here: https://github.com/ohdsi-studies/PioneerWatchfulWaiting/commit/8e40ba3525109c01ea429df1d6e888b67388b864

schuemie · September 21, 2021, 6:46am

Thanks @keesvanbochove! Could you provide some insights into how complicated it was to create this? Did you have any issues installing all the required R packages?

jposada · September 22, 2021, 9:00am

This is awesome @keesvanbochove!

Thanks a lot!

@tseto please check this out

keesvanbochove · September 23, 2021, 12:06pm

Hello Martijn, the Dockerfile I used is quite straightforward, there isn’t much else going on other than updating the OS a little bit (mainly installing Java), and then installing a bunch of R packages. I did copy some of it from our internal infra at The Hyve but I don’t think it is a ton of extra work to set this up and maintain it beyond what Rocker already provides. Of course as discussed with @lee_evans and @Adam_Black ideally we would have a regularly updated official OHDSI docker image that includes most of this as a basis, on which then the study package itself could just be a thin layer (right now the image adds over 1GB on top of Rocker base, see e.g. Docker Hub)

Adam_Black · September 23, 2021, 1:11pm

I’m going to look into creating a github action on a study package github repo to generate the dockerfile and push study specific docker images to dockerhub.

One comment about renv for package dependencies though:
renv is designed to capture and restore dependencies in an R project/analysis. R packages already have a way to specify their dependencies using the DESCRIPTION file. We are using both the DESCRIPTION file and renv.lock and kind of blurring the line between an R package (Software that is installed and contains internal logic intended to be hidden from the end user) and R project (Analysis code meant to be read and executed by an end user ~codeToRun.R).

schuemie · September 23, 2021, 1:49pm

Interesting point @Adam_Black ! I think our study packages depart from the usual R world where all R packages are created equally and each can therefore not demand too much, towards a more software application world where the application (study package) dictates very precise dependencies.

It isn’t even necessary for our study packages to be R packages, since they are (almost) never dependencies of other packages (I dislike the one exception I know), have no unit tests, etc.

SCYou · September 23, 2021, 1:59pm

@Adam_Black I’ve also created a dockerfile for reproducibility in my previous study (https://github.com/ohdsi-studies/TicagrelorVsClopidogrel/blob/master/Dockerfile)

jposada · September 23, 2021, 2:11pm

hi @Adam_Black ,

Another issue with renv is how to properly define which package is loaded. I usually run the studies from the command line and not from Rstudio as they take longer to execute. In doing so if libPaths() is not properly configured to use the renv cache as the first option, you have the wrong version of the package loaded at execution time. If a Dockerfile per study is produced we do not really need renv since we can just install the required version in the default R package directory. We could explicitly continue using the DESCRIPTION file as a way to enumerate the proper versions for every package needed.

By the way, the newest version of rocker used in the latest Odysseus Docker image, is presenting some issues with rlang that I have not been able to find the source or fix. @Konstantin_Yaroshove