I’m trying to run DbDiagnostics::executeDbProfile using a Databricks CDM.
I’m not sure how to configure the connectionDetails.
The following gives the error shown below
Code
connectionDetails <- DatabaseConnector::createConnectionDetails (
dbms = "spark",
server = "nachc-databricks.cloud.databricks.com:443/default",
user = "token",
password = "my-token",
pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\",
extraSettings="transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3"
)
Error
Connecting using Spark ODBC driver
Error: nanodbc/nanodbc.cpp:1118: IM002: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
How do I successfully connect to the Databricks instance?
I’m new(ish) to R, how can I see the code that’s being executed by executeDbProfile?
Connecting using Spark JDBC driver
Error in `findPathToJar()`:
! No drivers matching pattern '^DatabricksJDBC42\.jar$'found in folder 'D:\_YES_2023-05-28\workspace\SosExamples\_COVID\02-data-diagnostics\drivers\databricks\'.
Please download the JDBC drivers for your database to the folder. See `?jdbcDrivers` for instructions on downloading the drivers.
Run `rlang::last_trace()` to see where the error occurred.
It looks like my only choice for driver is “spark” (i.e. there is no “databricks” option). I’m not sure why it’s looking for a driver with a name matching ‘^DatabricksJDBC42.jar$’?
@Ajit_Londhe, @clairblacketer: do you happen to know if the SparkJDBC42.jar should work here and what I need to do to get it recognzied, or is there a different driver I need to get from Databricks?
@greshje.gmail I had the same problem and just renamed my Spark driver to match DatabricksJDBC42.jar. It seemed to work fine with databricks. Though probably worth posting an issue to GitHub for an update to DatabaseConnector.
4.) There should probably be a UseNativeQuery=1 in the connectionString (but I think the DbDiagnostics packages adds it for you… but others might not).
3.) We’re running Java 1.8
system(“java -version”)
java version “1.8.0_261”
Java™ SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot™ 64-Bit Server VM (build 25.261-b12, mixed mode)
We had called the dialect “spark” in SqlRender to try to keep it generic enough for various Spark SQL based methods, whether it be a regular open-source spark instance or a commercial Databricks instance.
However, in OHDSI tools, the names are a bit conflated, so unifying on one standard is a good goal.