Atlas - separate database engines for cdm and vocab/results?

Thomas_White · June 1, 2022, 9:17pm

We initially used SQL Server for our OMOP instances. We’re exploring fully shifting to Spark. However, we’re finding that standard Atlas navigation tasks (e.g. clicking on concepts to navigate the hierarchy, or selecting “related items”) is slower on Spark than SQL Server; whereas the cdm compute-intensive tasks are often faster via Spark.

I know that it would not be simple to use one database engine for cdm and tmp and another for vocab and results – since queries often need to read/write across all four schemas.

However, if there is a way to isolate the code for navigation queries that only read from the vocab tables, I’d like the ability to have those queries connect to a non-Spark database.

If anyone can point me to the right sections of the Atlas code, I’d be happy to fork the project and experiment this this.

If anyone has suggestions for alternate ways to get the navigation performance of a standard SQL database while also getting the compute performance of a Spark database, please share your ideas.

Chris_Knoll · June 2, 2022, 4:00am

You should be able to load the vocabulary tables up into a Sql Server DB, and register a SOURCE as only having a vocabulary in it. See the instructions on inserting records into SOURCE and SOURCE_DAIMON.

You’ll specify that the priority of the vocab-only SOURCE_DAIMON is 1, while the others will be set to 0, this will make the vocabulary source the default source when performing vocabulary searches in Atlas.

Thomas_White · June 3, 2022, 4:48am

Thanks much. I’ll give that a try.