OHDSI Home | Forums | Wiki | Github

A stop on adding new database platform support

The OHDSI tools support a wide array of database platforms (see the current list supported by HADES here). Most interactions with the database are rather complex (more than just SELECT FROM WHERE), offloading large data manipulations to the server. Because of this, adding support for a new database platform is non-trivial. It starts with adding some rules to SqlRender, but as most platforms have their own peculiarities (e.g. no temp tables, no ability to insert data into existing tables, no ability to delete rows in tables, limited number of inserts per minute, no support for quanitles), each platform often requires changes throughout the toolstack. Because there is no guarantee that code will work on a platform, we also need extensive unit tests against all platforms.

At the start of the year we imposed a new rule, that only databases for which a testing server was available could continue to be included. Because of this new rule, some testing servers were stood up by the community, while other platforms have now been deprecated.

Even with the testing servers, maintaining support for all these platforms puts a large burden on the OHDSI open source software developers. Unit tests need to be written and maintained, and when a unit test fails, the code has to be debugged.

Now I’ve been contacted by a new database platform vendor, who would like us to add support for their platform. They are friendly, and willing to write pull requests where necessary. However, just navigating them through the process, and reviewing and amending their pull requests will be work.

Which is why I’ve declared that, for now, no new database platforms will be added. I know this sucks, but I simply did not sign up to spend all my time managing support for database platforms. If the community feels it is important to continue to expand the set of supported databases, we’ll need a new person who can dedicate their full time to this. This person’s job would include the following tasks:

  • Develop and document a procedure by which new database platforms are expected to be added.
  • Enforce the procedure (e.g. harass vendors to provide testing servers)
  • Review and revise pull requests related to new database platforms.
  • Copy and adapt existing unit tests to the new platform.
  • Help debug (and reroute to the right person) issues arising out of specific database platforms.

This person’s skillset should probably include:

  • Some project management
  • Some R
  • SQL

Full disclosure: I work for InterSystems, a database vendor that’s particularly interested in supporting the OHDSI toolkit as we have a sizeable number of customers who’ve requested this, and we’re about to submit our first iteration of SqlRender and other key repo extensions

Martijn: are you suggesting you expect just the reviewing of PRs would be a full-time job? I absolutely understand this represents a non-negligible amount of work and am keen to learn how we as a vendor-contributor can minimize that work, as it’s our job and not the community’s to make sure things work on our platform. There’s no doubt that includes offering test infrastructure/licenses (ideally a free community license), contributing unit tests and take ownership of issues reported against our platform.

Thanks to how the OHDSI toolset is structured, with much of the platform-specific pieces encapsulated by SqlRender, I was under the impression this would be more of a procedure question, socializing that issues on a particular platform (other than pure OSS like Postgres) are first and foremost the vendor’s responsibility. Projects like dbt have a structure in which this is somewhat easier to “enforce” as each platform adapter has its own repo (I’m not advocating for changing the OHDSI repo structure!), but clear documentation buys you a lot in setting the right expectations and quickly route platform-specific issues to the right vendor-contributor.

I apologize if that’s naive :slight_smile: , but hopefully this illustrates our assumption that work is on us and commitment to live up to it.

Thanks,
benjamin

To clarify, is this now the policy of OHDSI as a whole or is this specific to Hades packages? I would assume that if this is a OHDSI-wide policy, this is an important topic for the technical advisory board to discuss as it seems a rather consequential edict @Paul_Nagy @lee_evans @Adam_Black?

To be clear, I completely appreciate why limiting support for alternative platforms is particularly important for the HADES given the approach of creating and maintaining a single unified way to connect to databases. However there is other software that hopefully are also considered as OHDSI tools that take a different approach, like CDMConnector which builds on top of DBI and dbplyr in R (disclosure, I’m involved in the development of this package) and FunSQL for julia @TheCedarPrince @cce, and the potential at least to build something on top of Ibis in python @hspence

1 Like

Thanks @edburn. Yes, I’m only speaking for HADES, where currently the task of maintaining database platform support falls on me.

To clarify: reviewing a pull request may take a couple of days if it is a completely new database platform. Additionally there are lots of small administrative tasks, like making sure unit tests are running appropriately, e.g. whether credentials are available. On average, a database platform requires a few hours per week. That just adds up to a full-time job when as a community we choose to support >10 platforms.

This isn’t about InterSystems, this is about whether we as an OHDSI community want to support a large number of platforms or not. If we do, then as a community we should make the resources available for that.

2 Likes

I completely agree with Martijn on this.

t