OHDSI Home | Forums | Wiki | Github

Connect to Databricks using Broadsea deployment

I am trying to do a Broadsea deployment that needs to connect to the CDM database in Databricks. I cloned Broadsea repo and then went through the steps to connect to the Postgrest webapi schema and run the source daimon script -

INSERT INTO webapi.source( source_id, source_name, source_key, source_connection, source_dialect, is_cache_enabled) 
VALUES (7, 'ADB_OMOP_7', 'ADB_OMOP_7', 
  'jdbc:spark://<databricks_url>:443/default;transportMode=http;ssl=1;httpPath=<somePathHere>;AuthMech=3;UseNativeQuery=1;UID=token;PWD=<personalAccessToken>', 'spark', true); 
 
-- CDM daimon 
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (23, 7, 0, 'omopcdm_demo', 2); 
 
-- VOCABULARY daimon 
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (24, 7, 1, 'omopcdm_demo', 2); 
 
-- RESULTS daimon 
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (25, 7, 2, 'omopcdm_demo', 2); 
 
-- EVIDENCE daimon 
INSERT INTO webapi.source_daimon( source_daimon_id, source_id, daimon_type, table_qualifier, priority) VALUES (26, 7, 3, 'omopcdm_demo', 2); 

I download the Spark JDBC driver from Databricks and placed it at the main (root) directory of the cloned Broadsea repository (where the docker-compose.yml is located).

However when I run Broadsea and check the Docker logs for the ohdsi-webapi container I see the error -

http-nio-8080-exec-7 com.odysseusinc.logging.LoggingService - [] - Could not get JDBC Connection; nested exception is java.sql.SQLException: No suitable driver found for jdbc:spark:/...

When I check the logs I see that there is an error loading the spark JDBC driver -

...
uery.jdbc42.Driver driver. com.simba.googlebigquery.jdbc42.Driver
ohdsi-webapi      | 2023-03-18 05:25:33.216 INFO localhost-startStop-1 org.ohdsi.webapi.DataAccessConfig - [] - error loading org.apache.hive.jdbc.HiveDriver driver. org.apache.hive.jdbc.HiveDriver
ohdsi-webapi      | 2023-03-18 05:25:33.217 INFO localhost-startStop-1 org.ohdsi.webapi.DataAccessConfig - [] - error loading com.simba.spark.jdbc.Driver driver. com.simba.spark.jdbc.Driver
ohdsi-webapi      | 2023-03-18 05:25:33.217 INFO localhost-startStop-1 org.ohdsi.webapi.DataAccessConfig - [] - error loading net.snowflake.client.jdbc.SnowflakeDriver driver. net.snowflake.client.jdbc.SnowflakeDriver
....

Any hints as to what am doing wrong here?

The drivers won’t be found in the root folder of the repo. Instead, if you are able to navigate to the filesystem inside the boradsea container, you can place the jdbc drivers under the tomcat directory: \webapps\WebAPI\WEB-INF\lib. If you restart the tomcat service, it should reload the libs found there for WebAPI, and the driver should be found (you should see it loaded in your startup log).

I think, the ‘correct’ way is to have the WebAPI build include the webapi-snowflake profile in the maven build command so that the drivers will be downloaded and installed into the WebINF folder automatically. @Ajit_Londhe , do you have information on how this can work? I remember you were working on adding parameterization to the docker configuration, I wonder if there’s a way to add additional properties for databricks and google big query? Or maybe have the docker build include those profiles by default?

I don’t currently have bringing a custom JDBC file into the WebAPI build in scope. I was focused on using WebAPI as-is, but with security, ssl, Ares, Solr Vocab convenience.

I’ll need to review bringing a different JDBC file.

Thanks @Chris_Knoll and @Ajit_Londhe for the responses !
@Chris_Knoll - what you said makes total sense. I was going through the Broadsea Readme, specifically this which gave me the impression that the non-shipped drivers need to be added to the host directory where the docker-compose.yml was located. As you suggested, I’ll try to build a container image using the webapi-spark profile and see if that works.

@Ajit_Londhe - Since you have contributed to the Databricks integration for WebAPI, I was wondering how WebAPI can connect to Databricks via JDBC. When I look at the WebAPI project POM, I see a profile called webapi-spark. However when I use this to build a docker image, the maven build fails as the jar spark-2.6.22.1040.jar is not available in maven central. Is there a way of using WebAPI as-is to connect to Databricks ?

As a test, I was able to get the set-up working with Databricks by including the Spark JDBC Driver downloaded from Databricks and tweaking the webapi-docker profile to include the spark dependencies (which also needed to be tweaked) and building a container and using that with the Broadsea docker-compose.yml.
Would love to understand if there is a ‘recommended’ way of doing this for Databricks.

Another observation - looks like the Simba JDBC Driver is only available from Databricks site (not sure if this is a licensed bit of software) and the newer versions of the Databricks driver is no longer has the com.simba. package namespace but uses com.databricks.. This would need to be adapted in the WebAPI DataAccessConfig as well.

t