OHDSI Home | Forums | Wiki | Github

SqlRender and Spark


(Vojtech Huser) #1

CMS VRDC (Virtual Research Data Center) new platform is based on Databricks. I understand it has a database layer (called Delta). In order to run SQL, I think the flavor to use is Spark.
See links below (optional)


https://docs.databricks.com/delta/intro-notebooks.html#delta-lake-quickstart-sql-notebook

I have questions to the community:

Did you have to deal with Spark SQL flavor at your site (in relationship to OMOP CDM shaped data)?

Did you try to add Spark support to SqlRender (and with what results)? (how different is that flavor and can it be ever supported?)


(Ajit Londhe) #2

Hi @Vojtech_Huser,

This is the platform we use. I’ve created a fork of both DatabaseConnector and SqlRender, and it’s mostly good, but needs some more validation.


A few items to note:

  1. I use delta tables for all tables, so that all standard update/delete operations are allowed (standard tables do not allow this)
  2. There is an attempt at MPP bulk loading for insertTable(), using the python library and DBFS
  3. Temp tables aren’t supported, so I’m using the oracleTempSchema cadence to point to an actual schema where permanent tables that are then dropped are kept

I’m trying to clean this work up for a PR into the master branches by end of December.


t