The current standard for distributing studies across the OHDSI network is through study R packages. These packages typically require other packages to be installed, both OHDSI packages as well as packages in CRAN.
One challenge we’re facing more and more is that these requirements may differ per study. Older studies may require an older version of a package, newer studies need a new version. Sometimes people are working on several studies at the same time, and need to switch from version to version. It is not always clear what versions are required to rerun a previously executed study, and from a reproducibility perspective that is bad.
In the past, to at least record the dependencies, I introduced the OhdsiRTools::insertEnvironmentSnapshotInPackage
function, which would add a CSV file documenting all versions installed at the time of the study execution. However, we had no easy way to restore an R environment from such a file.
I therefore also experimented with packrat, a technology that looked promising, but that I never got working (on Windows I never was able to get a set of source packages that could be built without error).
An alternative that is still being explored is Docker, a technology that allow taking a snapshot of the entire working environment, including the operating system. I have not been able to run Docker images myself, but perhaps this will be our solution in the future.
In the mean time, we (me, @Chris_Knoll, and @msuchard) did discovered a new technology that looks very promising: renv is the sequel to packrat. It has some important features:
-
Each study package can have its own R library. So different studies can be active simultaneously, each with different dependencies, and you can quickly switch between them without the need to reinstall the dependencies.
-
An R library can be rebuilt from scratch based on a so-called ‘lock file’. This file describes all the packages that must be installed, which precise versions, and where to install them from.
-
Most of the packages are installed from binaries, not from source, thus avoiding the issues I observed with packrat. It is also quite fast (rebuilding an environment using packrat could take many hours).
Although renv has it’s own functionality for creating lock files, I found it more convenient to write my own using a function I added to OhdsiRTools. The lock files are not that complicated, as you can see in the example file I generated.
I found that restoring the environment including the study package itself can be done by simply downloading the lock file. Here’s how one could install the Covid19CohortEvaluation study package including all of its dependencies:
# Install the latest version of renv:
install.packages("renv")
# Start a new project in RStudio (or when not using RStudio, create a new folder and
# set it as the current working directory). When asked if you want to use renv with the
# project, answer ‘no’.
# Download the lock file:
download.file("https://raw.githubusercontent.com/ohdsi-studies/Covid19CohortEvaluation/renv/renv.lock", "renv.lock")
# Build the local library:
renv::init()
# When not in RStudio, you'll need to restart R now
# And you’re done! The study package can now be loaded and used:
library(Covid19CohortEvaluation)
I propose we test this approach on a new OHDSI study, to see how well it works.
Thoughts?