Thanks for this Adam, and good to see you posting to the forums again! I was looking into a best practice we might follow locally for this a few weeks ago. I came across a great blog post covering the topic from lots of angles and now I can’t find it. I didn’t see the RStudio links you ref here, though. I’ll look into those. The blog did suggest that environmental vars weren’t a great solution because they’d be exposed in logs unless scoped to prevent that. The author was favoring an approach referencing files in inst. But I can’t remember the details. This secret R package also looks interesting. Here is the reference manual. It uses public key cryptography.
In any event, I think it’s a very useful to try to build consensus around a best practice for this.
Thanks!
In Python I’ve been using the keyring library, which is cross platform and interfaces with numerous key management systems and it’s proven very easy to use.
I agree we should propose best practices for this. I’ve always used R environmental variables, like for example here:
connectionDetails <- createConnectionDetails(dbms = "sql server",
server = Sys.getenv("RDS_SERVER"),
user = Sys.getenv("RDS_USER"),
password = Sys.getenv("RDS_PW"))
I would think the server and user name are also things you wouldn’t want exposed. Using environmental variables like this does have some downsides, like the credentials being stored in a plain-text file on your computer, so I’ll definitely take a look at the keyring package.
We could also modify DatabaseConnector::connect to accept data source names (DSN). If a string is passed in to connect as the first argument then it would be assumed to be a DSN.
Then connecting with integrated security using a DSN would look like con <- DatabaseConnector::connect("myDSN")
making it more like the DBI::dbConnect interface. Just curious, why does OHDSI it’s own package for connecting to databases instead of using DBI?
when R code is executed in ARACHNE, we currently require reading connection variables that are set in the ARACHNE Data Node (stored encrypted) where data source is configured