OHDSI Home | Forums | Wiki | Github

Supporting DBMS platforms other than SQL Server, PostgreSQL, Oracle and RedShift

New branches created and PRs submitted. The PR is not intended to indicate the code is ready for merging, just to allow everyone to easily review the code changes. I’ll continue to update the branches as comments and test results come back.

Aaron

Hi all,

This is Nigel Hartell with Seattle Children’s who Aaron mentions in the above thread. I’m pretty sure licensing should be a non-issue with the jdbc driver, but will follow up with IBM to confirm.

Seattle can be the guinea pig Netezza environment if that is required.

I can also take a look at the code to see if there are non-compliant coding issues as we port to Netezza from PostgreSQL.

Thanks,

Nigel

I merged the Netezza DatabaseConnector branch with master, because I didn’t want them to run too far out of sync and the code looked fine. Feel free to branch again if you want to make further changes.

Has any testing been done on Netezza? I’m expecting more changes will be needed on the SqlRender Netezza branch, so will keep that one separate.

Hi all,

I need to add support for Google BigQuery for OHDSI tools. Based on your comments, I know that I should change both SqlRender and DatabaseConnector. Any idea if it is possible at all, and where I should start?

Thank you,
Hajar

Hi Hajar,

Note that adding another DBMS to the list of platforms supported by OHDSI will be a major undertaking. My first guess is that just to get the OHDSI R packages working you’ll need a half person-year of an efficient Java + R + SQL programmer. Because there’s no JDBC driver for GBQ, adding GBQ to the WebApi will probably take an equal amount of effort. Make sure you are willing to commit the resources to do so before you proceed.

If you do want to go ahead, here are the steps:

First, support needs to be added to DatabaseConnector. This package relies on the JDBC architecture, but unfortunately there does not appear to be a (viable) JDBC driver for GBQ. This is not the end of the world, but it means that all functions of DatabaseConnector will have to be reimplemented for Google’s specific API, including these functions: connect, executeSql, querySql, querySql.ffdf, and insertTable.

Second, support needs to be added to SqlRender. This means rules need to be added to the translation table. Most importantly, these rules need to be tested with all the OHDSI packages.

Thank you Martijn, but what about starschema JDBC, isn’t it a viable JDBC driver for GBQ?

Oh, I hadn’t seen that one. Yes, that should make life a lot easier. First step would be to add that to the DatabaseConnector package.

Thank you, I have one more question, it seems that GBQ does not support some queries (I could not find any for insert and update!!). Is this a big issue? Is it resolvable?

Many Thanks,
Hajar

Yes, that’ll be a big issue. Across the OHDSI tools, we make extensive use
of various ANSI SQL functions, including CREATE, INSERT, SELECT, DELETE
operations, various functions including date comparisons and
partitioning/windowing functions, COUNT/COUNT_BIG/SUM, etc. If GBQ only
provides functionality for a subset of ANSI SQL, you will very likely run
into instances where your platform will not support the OHDSI analysis
capabilities.

I found a JDBC for GBQ that supports other queries. I added required code in DatabaseConnector.R and tested it executing select queries and it works correctly. However, when I try to insert a table using:

==============================================================
data <- data.frame(a = c(1, 2, 3), b = c(“a”, “b”, “c”), stringsAsFactors = FALSE)
insertTable(connection = conn,
tableName = “temp”,
data = data,
createTable = TRUE,
tempTable = TRUE)

I get the following error, am I missing something?

DBMS:
googlebigquery
Error:
execute JDBC update query failed in dbSendUpdate (executeUpdate does not accept SELECT statements. Use executeQuery instead.)
SQL:
IF OBJECT_ID(‘tempdb…#temp’, ‘U’) IS NOT NULL DROP TABLE #temp

I am glad to see interest in more dialects being supported.

I have not seen UPDATE sql command in many studies or tools. Mostly a table is dropped and re-created.

For INSERT - Achilles is using plain INSERT INTO VALUES.
But I have not seen studies using INSERT VALUES code.

As for the R function insertTable() - I also think it is not heavily used.

If you can suport CREATE table, SELECT, DELETE and temp tables in GBQ - the bulk use may be covered.

Trying to run Achilles on GBQ is a good test of functionality. Look at SQL files here to get an idea: https://github.com/OHDSI/Achilles/blob/master/inst/sql/sql_server/Achilles_v5.sql

@hajarhomayouni: the error message is because you now need to start adding translation rules to SqlRender for the GBQ dialect. The starting point is always SQL Server dialect, which is what the insertTable function is now trying to use.

See the SqlRender vignette for a list of functions and patterns that are expected to be translated (or do not require translation).

@Vojtech_Huser: all methods in the library use the insertTable function.

1 Like

Yes, it seems that GBQ is a readonly data warehouse. Aren’t all queries like create, insert, delete and update used in OHDSI for temp data, because for analytics purpose we only need to read from data. if so is it possible to read (SELECT) the data from GBQ and store temp data in CloudSQL (which supports all queries)? In other words, is it easy to separate these two?

1 Like

In general, the analytical approach we take in most of the current
analytics apps is that we are generating aggregate summary statistics from
patient-level data so create/insert of the summary stats after selecting
from the patient-level data in CDM is an expected workflow. We store the
results in the ‘results’ schema, and that’s also the schema that is used
when we produce temp tables, to accommodate the use case where the CDM is
read-only. But the logical constraint across the OHDSI architecture is
that the CDM and results schemas need to be physically on the same
source/server, because queries need to span across the CDM and results
schemas. In fact, @Frank is going through this exact infrastructure on the
OHDSI architecture call right now as I type! So, if in your proposed
structure, GBQ and CloudSQL are separate instances, potentially in
different environments, that wouldn’t work. If you are faced with the
option, rather than trying to re-write the OHDSI tools, I’d suggest you’d
be better off porting your data to a supported database environment (e.g.
Oracle, SQLServer, Postgres, Redshift, APS).

@Vojtech_Huser: all methods in the library use the insertTable function.

OK. That is important.

Many db engines are optimized for some purpose. We recently hit a big problem for inserting data. RedShift takes 1-2seconds to add one row. If we run isertTable() on 5k rows - we are looking at 30 min wait time. (just for that insert).

The separation suggested by @hajarhomayouni is not a bad long term idea. (even for current RedShift user) (unless there is a fast insert method that we are missing (but probable don’t have privileges for it anyway))

We are trying to do advanced analytics, and we can’t do it with our hands tied behind our backs. We sometimes just need to insert data back onto the server, and that data often needs to be joined with the original CDM data for further analyses.Having separate database platforms for holding the CDM and for holding working data is just not something I see as being feasible.

Already for something as ‘simple’ as Achilles this would mean a major redesign, where results now all of a sudden has to go through R instead of going directly from CDM to results tables. Other analyses, where we for instance want to instantiate cohorts and study them would be impossible.

Thank you Patrik, do you support MySql as well? I saw required code in DatabaseConnector, but didn’t see the supports in SqlRender, am I right?

No, I’m afraid we don’t support MySQL. We looked at it, but found it to be too limited in many ways (first thing I remember is that a temp table can only be referred to once in a query, but there were other things as well). We chose to support PostgreSQL instead as an open source DBMS.

We do not support mysql. Mysql is missing several basic Sql functions that
we determined are essential for our analytics use cases, including
windowing functions and multiple use of temp tables within a given query.
The ohdsi stack was developed to support postgres, oracle, sql server, aps,
redshift.

I thought I would close this thread with a final update of Hajar’s use of her summer time to try to get ATLAS running under Google BigQuery and Google CloudSQL (their cloud variant of MySQL). Bottom line is after multiple attempts to consider various work-arounds, hacks, and many very ugly “don’t look behind the curtains” ideas in both platforms, we pulled the plug on this work and redirected Hajar to much more productive activity, which I’ll post in a separate thread in this Forum per Patrick’s suggestion.

t