Encourging best practices for securing database credentials

Adam_Black · June 3, 2020, 6:49pm

Throughout the OHDSI documentation and examples database credentials are exposed in plain text.

connectionDetails <- createConnectionDetails(dbms="postgresql",
                                         server="localhost/cdm",
                                         user="ohdsi",
                                         password="blah")

Perhaps we can consider encouraging best practices for securing credentials by modifying examples and template code. RStudio has some helpful guidance: https://db.rstudio.com/best-practices/managing-credentials/

Here is an idea for a template that encourages keeping passwords secure (i.e. not saved in .R or .Rhistory files)

keyring::key_set(service = "cdm", username = "ohdsi")
connectionDetails <- createConnectionDetails(dbms = "postgresql",
                                             server = "localhost/cdm",
                                             user = "ohdsi",
                                             password = keyring::key_get("cdm", "ohdsi"))

Other options might be rstudioapi::askForPassword(), askpass::askpass(), or using environment variables.

Andrew · June 3, 2020, 7:39pm

Thanks for this Adam, and good to see you posting to the forums again! I was looking into a best practice we might follow locally for this a few weeks ago. I came across a great blog post covering the topic from lots of angles and now I can’t find it. I didn’t see the RStudio links you ref here, though. I’ll look into those. The blog did suggest that environmental vars weren’t a great solution because they’d be exposed in logs unless scoped to prevent that. The author was favoring an approach referencing files in inst. But I can’t remember the details. This secret R package also looks interesting. Here is the reference manual. It uses public key cryptography.

In any event, I think it’s a very useful to try to build consensus around a best practice for this.
Thanks!

gkennos · June 3, 2020, 10:26pm

In Python I’ve been using the keyring library, which is cross platform and interfaces with numerous key management systems and it’s proven very easy to use.

schuemie · June 4, 2020, 6:49am

I agree we should propose best practices for this. I’ve always used R environmental variables, like for example here:

connectionDetails <- createConnectionDetails(dbms = "sql server",
                                             server = Sys.getenv("RDS_SERVER"),
                                             user = Sys.getenv("RDS_USER"),
                                             password = Sys.getenv("RDS_PW"))

I would think the server and user name are also things you wouldn’t want exposed. Using environmental variables like this does have some downsides, like the credentials being stored in a plain-text file on your computer, so I’ll definitely take a look at the keyring package.

Adam_Black · June 4, 2020, 12:28pm

We could also modify DatabaseConnector::connect to accept data source names (DSN). If a string is passed in to connect as the first argument then it would be assumed to be a DSN.

Then connecting with integrated security using a DSN would look like
con <- DatabaseConnector::connect("myDSN")
making it more like the DBI::dbConnect interface. Just curious, why does OHDSI it’s own package for connecting to databases instead of using DBI?

gregk · June 4, 2020, 12:49pm

when R code is executed in ARACHNE, we currently require reading connection variables that are set in the ARACHNE Data Node (stored encrypted) where data source is configured

dbms <- Sys.getenv(“DBMS_TYPE”)
connectionString <- Sys.getenv(“CONNECTION_STRING”)
user <- Sys.getenv(“DBMS_USERNAME”)
pw <- Sys.getenv(“DBMS_PASSWORD”)
cdmSchema <- Sys.getenv(“DBMS_SCHEMA”)
resultsSchema <- Sys.getenv(“RESULT_SCHEMA”)