We can host a docker repository here. Odysseus has been uploading already compiled images there. From there people will do a docker pull to get the image and run it.
The answer is yes. Almost everything if not all can be inside and you will run the study by executing a command-line argument. Here an example from AllenNlp that may give you an idea of how the dockers ar run for scientific packages.
The workflow you proposed is very close to what we should do, however, as anything goes we will need to test.
Thanks for the links, tho, I am not sure what I am seeing: none of the repositories I looked at (I viewed about 5) had any overview information on it; is odysseus/r-env different from the renv that @schuemie was referring to? there’s an r-java docker but isn’t r-java just an inner package of R, and not something you consume stand-alone? (do dockers ‘merge together’ to assemble a single r environment from a set of smaller dockers or is the primary use of dockers to produce a sort of ‘process image’ for specified functionality (such as a web server or J2EE WAR container)… Same with the AllenNlp, not sure what i’m looking at there.
But, I think the core of my 2 questions were answered: someone needs to host them (which we can use Docker Hub) and someone needs to load in the assets if we want a completely self-contained docker image.
So, I feel like the workflow I laid out above works for both contexts: the study designers and the study distributors: martijn doesn’t need to really know anything about docker in order to produce a new study implementation (less burden on the dev). He can load up all the necessary dependencies and then capture the versions of those dependencies via renv() or building a custom .lock file. For distribution, people have the choice of either initializing their environment using the .lock file, or pulling down an image from a docker repo (which someone must have initialized/published the image somehow, and the .lock file makes this very easy).
Hi @Gowtham_Rao. Yes, it is integrated, but I see two issues with using the built-in functionality:
By default, renv tries to infer what packages need to be included from the code in your RStudio project, rather than what is explicitly listed in the DESCRIPTION file of your study package. I found that this tends to include a lot of packages that aren’t needed for running the study, such as things I have my PackageMaintenance.R, like OhdsiRTools (with its many dependencies), pkgdown, and ROhdsiWebApi, as well as knitr, rmarkdown, etc. So the lock files becomes very ‘heavy’.
renv doesn’t always get the installation details for the OHDSI R packages that come from GitHub correct.
Export study package from ATLAS. This would already have the appropriate lock file for all the dependencies. If we assume the study package name will be the same as the ohdsi-studies repo name, ATLAS / Hydra can already include the reference to that repo to install the study package itself.
I can help with Docker and R environment.
Today we have already pre-build R-environment with all OHDSI packages available in Execution Engine and as separate image. We can use it as a base and extend it with “renv” or OhdsiRTools updates.
Nowadays Execution Engine could be used in combination with:
This integration allows to create and execute Prediction and Estimation studies directly from ATLAS. But we are limited to these types of analysis here. Results of execution are available for download directly in ATLAS.
This integration allows to execute ANY type of analysis - this is suitable for ohdsi-studies repo. DataNode provides UI for datasource configuration and analysis execution for users who are not experienced with all technical details of R packages installation.
I said that the docker ‘managers’ would grab the dependencies via .lock file and package up a context for use by people that use docker. I was trying to avoid forcing people to engage in another technology (docker) when it’s not necessary:
I was trying to say here that the work between ‘development done’ and ‘deployment done’ doesn’t need to be done by the study author. The author can focus on the tools to build the study, and the deployer can focus on the tools that deploy.
I’m pretty sure we don’t want to only deploy docker images for people to execute: people should be able to get the study and run it directly in their own R session. Am I wrong about that assertion?
Great discussion. I want to help with this! I agree with @Chris_Knoll that we can use both renv and docker for dependency management, testing, and study execution. However, I think the need to use OHDSI tools and run studies from a local R install or R server will remain even in the presence of a docker based OHDSI study distribution system.
Using renv seems like a great idea particularly if different studies require different OHDSI R package versions.
You can create a .renvignore file (with entries of the same format as a standard .gitignore file) to tell renv which files to ignore within a directory. ref
I’m not sure what the issue is exactly but it might not be a problem with renv. It might be a problem with using install_github as the means of package distribution. See Why could install_github be wrong? from the drat FAQ.
The renv package provides a simple and clean interface for saving and restoring the state of an R project:
The problem with using a custom function to create the renv lock file is that we are maintaining our own interface to functionality provided by renv. If renv changes the format of the lock file the above interface will still work but the custom snapshot function (createRenvLockFile) might no longer work. If issue #2 truly is a problem with renv then it seems preferable for the issue to be fixed in the renv package unless it is an OHDSI specific problem. In this case it doesn’t seem like a big deal but we should think critically about interfaces between the OHDSI tools and dependencies.
I’ve heard of rocker but but not the ropenscilabs tutorial. I’m very much a docker novice but definitely see the value for reproducible research. I’m still getting up to speed with Odysseus’ docker repo and Arachne. I imagine if we added RStudio Server to the docker execution_engine it might make a nice reproducible development and execution environment for R programmers. I’ll look at those links. Thanks!
@Adam_Black: yes, this brings us to another topic that needs discussing: install_github vs. drat.
We had intended for drat to be our package-deployment mechanism. And we do in fact add an entry to drat for every package release. But in practice we always use install_github, for the following reasons:
You can only install from drat that which is in drat. This currently precludes study packages. We can think of adding those to drat as well, but that would require handling of push rights to the drat repo, and basically mean a lot of management overhead.
Installing specific versions of packages from drat is pretty awful. Using install_github we can type install_github("ohdsi/DatabaseConnector", ref = "v2.2.0"), using drat we need to type install.packages("https://github.com/OHDSI/drat/raw/gh-pages/src/contrib/DatabaseConnector_2.2.0.tar.gz", repos = NULL). I can easily remember the first command. I always need to look up how to do the second (and then still make mistakes in the URL).
install_github and drat do not work together. So if my study package is not in drat, I’m stuck with install_github.
@Adam_Black: the issue with renv and our GitHub packages is that renv is unaware of our conventions of tagging versions in GitHub. So if for example today I install an OHDSI package using install_github("ohdsi/SqlRender"), that will install the current version:1.6.6. renv does not know that the way to reproduce that in the future is to use install_github("ohdsi/SqlRender", ref="v1.6.6"), but my function does.