Wanted to raise the proposal for OHDSI community to discuss and establish some basic coding principles and best practices. It is very important that the R code produced today is being written and built with re-use in mind. This is especially true for many R-based analytical methods that are built to be distributed across multiple organizations.
I am not an expert in R, but based on some quick reading I have done - all of the below is applicable in this language as well. Here are some ideas - please share your thoughts:
-
PARAMETERIZED MODULES: Today, in many instances, instead of having modules that have a general entry point function with input/output parameters, it is sometimes assumed that someone will actually need to edit the “main” function and adjust the actual code in the module itself. Best practice should be to wrap it up into the “run” function that have a number of input and output parameters. Write another environment specific module that references re-usable component and provides instance specific arguments as input into that fuuncrton
-
SELF-CONTAINED. If it is expected that the module will be distributed, make sure to use packrat to package a re-usable part into a single distributable. This will ensure that proper dependencies are bundled vs. being dynamic resolved (which should be used in development environments). Then use #1 to invoke it
-
VERSIONED: version packaged modules using versioning best practices e.g. using release.major.minor stamp
-
ATOMIC: In the code that is being distributed, treat execution as a single atomic transaction and do assume or rely on caching that might not exist in another environment. Clean up temp objects after execution is completed to ensure it can be run multiple times.
-
EXCEPTION HANDLING: do proper exception handling with trycatch. Make sure to check business conditions and raise business meaningful exceptions vs. system generated exceptions when accessing objects that failed to be created. If exception, terminate gracefully and clean up any temp object that might have been created.