How to connect to Databricks using DbDiagnostics::executeDbProfile (and others)

greshje.gmail · August 8, 2023, 4:23pm

I’m trying to run DbDiagnostics::executeDbProfile using a Databricks CDM.

I’m not sure how to configure the connectionDetails.

The following gives the error shown below

Code

connectionDetails <- DatabaseConnector::createConnectionDetails (
  dbms = "spark",
  server = "nachc-databricks.cloud.databricks.com:443/default",
  user = "token",
  password = "my-token",
  pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\",
  extraSettings="transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3"
)

Error

Connecting using Spark ODBC driver
Error: nanodbc/nanodbc.cpp:1118: IM002: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified

How do I successfully connect to the Databricks instance?

I’m new(ish) to R, how can I see the code that’s being executed by executeDbProfile?

I’m assuming it is something similar to this:

odbc_con <- dbConnect(odbc::odbc(),
                  Driver = "SQLServer",
                  Server = server,
                  Database = test, 
                  Uid = 'username',
                  Pwd = 'password',
                  Port = 1433)

Thanks,
John

greshje.gmail · August 8, 2023, 5:06pm

I got a little bit further by just using the connectionString parameter:

connectionDetails <- DatabaseConnector::createConnectionDetails (
  dbms = "spark",
  connectionString = "jdbc:spark://nachc-databricks.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3;UID=token;PWD=MY-TOKEN",
  pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\",
)

But now I’m getting this error:

Connecting using Spark JDBC driver
Error in `findPathToJar()`:
! No drivers matching pattern '^DatabricksJDBC42\.jar$'found in folder 'D:\_YES_2023-05-28\workspace\SosExamples\_COVID\02-data-diagnostics\drivers\databricks\'. 
Please download the JDBC drivers for your database to the folder. See `?jdbcDrivers` for instructions on downloading the drivers.
Run `rlang::last_trace()` to see where the error occurred.

I’m using this driver:
SparkJDBC42.jar

greshje.gmail · August 8, 2023, 6:48pm

It looks like my only choice for driver is “spark” (i.e. there is no “databricks” option). I’m not sure why it’s looking for a driver with a name matching ‘^DatabricksJDBC42.jar$’?

@Ajit_Londhe, @clairblacketer: do you happen to know if the SparkJDBC42.jar should work here and what I need to do to get it recognzied, or is there a different driver I need to get from Databricks?

jon_duke · August 8, 2023, 6:58pm

@greshje.gmail I had the same problem and just renamed my Spark driver to match DatabricksJDBC42.jar. It seemed to work fine with databricks. Though probably worth posting an issue to GitHub for an update to DatabaseConnector.

greshje.gmail · August 8, 2023, 8:01pm

Thanks @jon_duke,

I get this error when I try that.

Connecting using Spark JDBC driver
Error in loadJdbcDriver(driverClass, classPath) : 
  java.lang.ClassNotFoundException
>

greshje.gmail · August 8, 2023, 10:00pm

Huzzah!!!

We were able to get the stars to align and connected to our Databricks instance.

A couple of things:

1.) Download the DatabricksJDBC42-2.6.33.1055.zip from JDBC Drivers Download – Databricks and use the jdbc driver jar file from there.

2.) This is what worked for connection parameters (note the jdbc:databricks in the url, not jdbc:spark, and dbms = “spark”).

connectionDetails <- DatabaseConnector::createConnectionDetails (
  dbms = "spark",
  connectionString = "jdbc:databricks://nachc-databricks.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3;UID=token;PWD=MY_TOKEN",
  pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\"
)

4.) There should probably be a UseNativeQuery=1 in the connectionString (but I think the DbDiagnostics packages adds it for you… but others might not).

3.) We’re running Java 1.8
system(“java -version”)
java version “1.8.0_261”
Java™ SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot™ 64-Bit Server VM (build 25.261-b12, mixed mode)

Ajit_Londhe · August 9, 2023, 6:12pm

We had called the dialect “spark” in SqlRender to try to keep it generic enough for various Spark SQL based methods, whether it be a regular open-source spark instance or a commercial Databricks instance.

However, in OHDSI tools, the names are a bit conflated, so unifying on one standard is a good goal.

@Brad_Rechkemmer thoughts?