We initially used SQL Server for our OMOP instances. We’re exploring fully shifting to Spark. However, we’re finding that standard Atlas navigation tasks (e.g. clicking on concepts to navigate the hierarchy, or selecting “related items”) is slower on Spark than SQL Server; whereas the cdm compute-intensive tasks are often faster via Spark.
I know that it would not be simple to use one database engine for cdm and tmp and another for vocab and results – since queries often need to read/write across all four schemas.
However, if there is a way to isolate the code for navigation queries that only read from the vocab tables, I’d like the ability to have those queries connect to a non-Spark database.
If anyone can point me to the right sections of the Atlas code, I’d be happy to fork the project and experiment this this.
If anyone has suggestions for alternate ways to get the navigation performance of a standard SQL database while also getting the compute performance of a Spark database, please share your ideas.