OHDSI Home | Forums | Wiki | Github

Broadsea v3.0 Using Spark JDBC Connection

I am currently able to connect to a spark connection (Databricks) on Broadsea v2.0 by connecting to the webapi.source and webapi.source_daemon tables with the relevant connection details. When attempting to upgrade to Broadsea v3.0, I have updated those same tables with the same details, but am now receiving an error within my Simba driver “[Simba]SparkJDBCDriver Error initialized or created transport for authentication: problem accessing trust store.”

Looking this error up online, this appears to be a credentials issue, but I know this driver and connection details work because I am able to connect via DatabaseConnector from the Hades environment (in the Broadsea v3.0 container), but not when attempting to load the Atlas tool. These are also the same credentials from the previous version, so I believe my steps should still be correct. I am wondering if this is due to the Broadsea upgrade and how it communicates with webapi.

One last FYI, Broadsea v2.0’s flyway password for the webapi section of the docker-compose.yml file needs to be updated for it to work. It looks like that password was changed to match the datasource password with the new release.

Am having exact same issue. Thought it might have to do with enabling HTTPS on traefik router. Is now configured with self-signed certs but did not fix the issue.

Are your JDBC connection strings requiring SSL?

That is for the Broadsea server only, as proxy on top of the various containers.

The issue here is that the webapi container, based on your CDM connection string, is trying to initiate secure connections via JDBC to Databricks. If JDBC with SSL isn’t a need, then disable that in your connection string. If it is a need, I think you’d need to mount a cacerts file that allows the JVM keystore in the webapi container to be trusted.

Tagging @greshje.gmail as he’s been working through all things Databricks.

1 Like

Thanks for the quick response.

Based on the connection string, it appears Databricks is serving over port 443. My guess is, my spark driver is not properly encrypting on broadsea, because it isn’t registering any .jks trust store.

Update! I see the nice Databricks tutorial draft. I essentially arrived at the same workflow.

Unfortunately when I set ssl=0 in the connection string I still can’t connect to my cluster.

Hmm. What does the webapi container log show for that connection attempt?

With ssl=0

2023-05-11 02:59:28.768 INFO http-nio-8080-exec-3 com.odysseusinc.logging.LoggingService - [] - Could not get JDBC Connection; nested exception is java.sql.SQLException: [Simba][SparkJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.simba.spark.jdbc42.internal.apache.http.NoHttpResponseException: host:443 failed to respond.

And for ssl=1:
2023-05-11 03:01:45.254 ERROR taskExecutor-1 org.ohdsi.webapi.cdmresults.service.CDMCacheService - [] - Failed to warm cache DATABRICKS. Exception: Could not get JDBC Connection; nested exception is java.sql.SQLException: [Simba][SparkJDBCDriver](500164) Error initialized or created transport for authentication: problem accessing trust store.

My connection string, from my databricks cluster:

jdbc:spark://<MYDBHOST>:443/default;transportMode=http;ssl=0;httpPath=<MYSQLPATH>;AuthMech=3;UID=token;PWD=<MYTOKEN>

For my particular use case, I created an empty OMOP datamodel using the spark DDLs and populated the person table with data. Can’t wait to see that data populate in ATLAS, but have to solve this connection issue.

Seems like the Databricks cluster is requiring a secure connection, and rejecting insecure connections. Can you share the steps you used to build your cacerts file?

The docs on the databricks working group page suggest certs are not necessary to demo this connection if you change the ssl flag, but I’m not sure that’s true, since it’s served over port 443.

I will have to dig and see how I created those certs but, I couldn’t get my spark driver to load them as a trust store anyway.

1 Like

Perhaps, the cacerts file is generated by Databricks?

If so, you could grab that file, place it at the Broadsea root folder ("./cacerts"), and then it’ll mount it to the webapi container.

Thanks for the link. This is for encryption of traffic between cluster and worker nodes…but I’m not sure this applies to a JDBC connection. Furthermore, there are a number of .jks and .pem files to choose from.

t