Supporting DBMS platforms other than SQL Server, PostgreSQL, Oracle and RedShift

schuemie · June 15, 2016, 8:10pm

@hajarhomayouni: the error message is because you now need to start adding translation rules to SqlRender for the GBQ dialect. The starting point is always SQL Server dialect, which is what the insertTable function is now trying to use.

See the SqlRender vignette for a list of functions and patterns that are expected to be translated (or do not require translation).

@Vojtech_Huser: all methods in the library use the insertTable function.

hajarhomayouni · June 16, 2016, 4:03pm

Yes, it seems that GBQ is a readonly data warehouse. Aren’t all queries like create, insert, delete and update used in OHDSI for temp data, because for analytics purpose we only need to read from data. if so is it possible to read (SELECT) the data from GBQ and store temp data in CloudSQL (which supports all queries)? In other words, is it easy to separate these two?

Patrick_Ryan · June 16, 2016, 5:18pm

In general, the analytical approach we take in most of the current
analytics apps is that we are generating aggregate summary statistics from
patient-level data so create/insert of the summary stats after selecting
from the patient-level data in CDM is an expected workflow. We store the
results in the ‘results’ schema, and that’s also the schema that is used
when we produce temp tables, to accommodate the use case where the CDM is
read-only. But the logical constraint across the OHDSI architecture is
that the CDM and results schemas need to be physically on the same
source/server, because queries need to span across the CDM and results
schemas. In fact, @Frank is going through this exact infrastructure on the
OHDSI architecture call right now as I type! So, if in your proposed
structure, GBQ and CloudSQL are separate instances, potentially in
different environments, that wouldn’t work. If you are faced with the
option, rather than trying to re-write the OHDSI tools, I’d suggest you’d
be better off porting your data to a supported database environment (e.g.
Oracle, SQLServer, Postgres, Redshift, APS).

Vojtech_Huser · June 17, 2016, 4:58pm

@Vojtech_Huser: all methods in the library use the insertTable function.

OK. That is important.

Many db engines are optimized for some purpose. We recently hit a big problem for inserting data. RedShift takes 1-2seconds to add one row. If we run isertTable() on 5k rows - we are looking at 30 min wait time. (just for that insert).

The separation suggested by @hajarhomayouni is not a bad long term idea. (even for current RedShift user) (unless there is a fast insert method that we are missing (but probable don’t have privileges for it anyway))

schuemie · June 20, 2016, 8:27am

We are trying to do advanced analytics, and we can’t do it with our hands tied behind our backs. We sometimes just need to insert data back onto the server, and that data often needs to be joined with the original CDM data for further analyses.Having separate database platforms for holding the CDM and for holding working data is just not something I see as being feasible.

Already for something as ‘simple’ as Achilles this would mean a major redesign, where results now all of a sudden has to go through R instead of going directly from CDM to results tables. Other analyses, where we for instance want to instantiate cohorts and study them would be impossible.

hajarhomayouni · June 22, 2016, 6:52pm

Thank you Patrik, do you support MySql as well? I saw required code in DatabaseConnector, but didn’t see the supports in SqlRender, am I right?

schuemie · June 23, 2016, 6:56am

No, I’m afraid we don’t support MySQL. We looked at it, but found it to be too limited in many ways (first thing I remember is that a temp table can only be referred to once in a query, but there were other things as well). We chose to support PostgreSQL instead as an open source DBMS.

Patrick_Ryan · June 23, 2016, 10:03am

We do not support mysql. Mysql is missing several basic Sql functions that
we determined are essential for our analytics use cases, including
windowing functions and multiple use of temp tables within a given query.
The ohdsi stack was developed to support postgres, oracle, sql server, aps,
redshift.

mgkahn · July 27, 2016, 7:01pm

I thought I would close this thread with a final update of Hajar’s use of her summer time to try to get ATLAS running under Google BigQuery and Google CloudSQL (their cloud variant of MySQL). Bottom line is after multiple attempts to consider various work-arounds, hacks, and many very ugly “don’t look behind the curtains” ideas in both platforms, we pulled the plug on this work and redirected Hajar to much more productive activity, which I’ll post in a separate thread in this Forum per Patrick’s suggestion.