OHDSI Home | Forums | Wiki | Github

Ideas for SqlRender 2

SqlRender has been around for a while now, and some of the original design decisions seem to make little sense in hindsight. I’d like to start the discussion on what we’d like to see in the new incarnation of SqlRender. Here’s what I currently have on my wish list:

  • A new name? (Maybe some critter from greek mythology? Anyone know a good acronym?)
  • Dropping the notion of specifying a source dialect. We’re stuck with (some variant of) SQL Server SQL as our source. We should drop the sourceDialect parameter altogether.
  • Adding markup for things that are not in SQL Server. One current example where we need this is defining the column used for hashing when creating a table in a parallel DBMSs such as RedShift and PDW.
  • Dropping the distinction between ‘rendering’ and ‘translating’. Currently these are two separate commands that are always used together, so why not make it a single command (as already done in the R command loadRenderTranslateSql).
  • Relabeling our source dialect as ‘OHDSql’. This should be easier to communicate than say it is ‘SQL Server SQL with custom markup’. I would argue that OHDSql should be a subset of SQL Server SQL where possible, and only add extensions where needed.

Let me know what you think should be in SqlRender 2. (@Chris_Knoll ? @anthonysena ? @Patrick_Ryan ?)

Name: how about ‘sirens’, they sang a song so beautiful that attracted all
from afar.

Syntax
In
Rendering
Expressions for
Network
Science

I’m ok with other proposals to drop sourcedialect and renaming the dialect
ohdsql. I can think of situations where it can be useful to have translate
and render separate, but don’t feel strongly about that. Given sql server
syntax is not native to everyone, it would be valuable to ensure the
community has access to resources about the basics and also can use the
cross database sql test environment that lee built for the community.

Cheers,

Patrick

@Patrick_Ryan, you have missed your calling! The name is great. (Are you working on the next greatest novel on the side??) As long as we don’t have to be tied to a mast while the ship navigates Scylla and Charybdis!

Is there a working group for this? It’s an interesting topic, and I would like to be involved a little more.

Don

Love the name, @Patrick_Ryan!

I’m also happy to see the package getting some TLC, and would be happy to chip in as able. I’m also a fan of dplyr – has there been any thought to approaching that group to see whether it’d make sense to merge?

We could also use recursive acronym. (like SPARQL) Recursive acronym - Wikipedia
Is there any chance we would extend the universe to cover Roman gods too - just in case we run out of good names.

It may be fun to also pick female names - to stay gender neutral :slight_smile:
One possible website to pick names could be:
http://www.theoi.com/Tree0.html

These are all optional ideas - I have no strong feelings about the name for it.

For Render+Translate - I agree to add a combined simpler function but having the components is also good. I am using it (just render) for populating parameters just within RedShift (although sprintf would be OK for that too). (I guess I was blindly hoping that I would one day find time to port my RedShift SQL to MS SQL [and release my OHDSI METIS package for advanced data exploration]).

I am also a fan of dplyr! I have never tried their db-linkage function. All my dplyr usage was for in-memory data. If it would work with redshift it would give us additional query notation other than SQL. Which sounds very appealing.

Some people also use Python libraries for SQL dialect translation. I wonder if those other libraries allow going in multiple directions (from RedShift into Netezza, from Oracle to Postgres).

The fully public SQL sandbox is a great suggestion that I totally second.

Personally, I never liked that MSSQL was the default dialect of SQL for SQLRender. I understand that we have to pick something, but picking a DB that requires a license seems awkward given that we’re a group that releases software under Apache 2 license. It would be great to use a more publicly available database and SQL implementation as a default, perhaps one that more closely follows the ANSI standard.

Bill

@wstephens, just to clarify one point: we someone ‘lucked’ into our
current situation, but the rationale for using the MSSQL dialect has
nothing to do with any particular affinity to MSSQL, but rather it turns
out that MSSQL is the most explicitly expressive of the SQL dialects,
making translation to the less-explicit dialects possible. I’ll use a
simple example to illustrate: in many of our observational analyses, we
need to compare dates (ex. determine if a patient has adequate observation
period prior to exposure). MSSQL uses explicit functions for date
comparisons, so syntax would look something like :
DATEDIFF(day,OBSERVATION_PERIOD_START_DATE,DRUG_EXPOSURE_START_DATE) to
compute the number of days within an interval. In Oracle and Postgres,
there’s an implicit comparison made using the syntax:
DRUG_EXPOSURE_START_DATE - OBSERVATION_PERIOD_END_DATE. It turns out its
possible, with Martijn’s magic, to translate from the more expressive to
the less expressive syntax, but not vice versa, because there’s nothing in
the Oracle/Postgres syntax to tell you you are subtracting dates and want
the unit to return to be in days. In SQLRender2, I think it’ll help to
refer to our starting-point syntax s OHDSql instead of MSSQL so people
don’t get confused, but it’ll still be the case that we need the expressive
syntax to make our tools work across all platforms.

I personally find it quite annoying that the different RDBMS have to
unnecessarily diverge is silly ways, but don’t have much control over
that. What i do have control over is how I’m trying to write my SQL for my
contribution to OHDSI apps, and there I agree and try to stick to ANSI
standards as much as possible. The main functions I use that deviate from
that are: date comparisons and windowing functions (e.g. ROW_NUMBER over
(PARTITION BY … ORDER BY…) and use of temporary tables (which are
frustratingly handled differently across the board). I’m avoiding all
analytic functions and stored procedures, even though sometimes they’d make
my life easier, because we haven’t been able to make them work across
platforms. Right now, SQLRender gives me what I need to be able to support
analytical use cases, I haven’t run into something that stops me, but
perhaps I haven’t gotten really creative yet.

Would we be willing to call it OHDISql instead of OHDSql…first sounds like ‘oh-DI-Sql’, while second sounds like ‘Odd Sql’…and who would want to use ‘Odd Sql’?

Nevermind! It’s a hard ‘D’…OHDSql is oh-D-Sql. Perfect!..moving on…moving on…
-Chris

I really like SqlRender and use it fairly regularly.

No preference on the name.

I have to agree with @wstephens that something more closely adhering to ANSI SQL-92 would be ideal, although @Patrick_Ryan perfectly illustrates the issue with this for the DATEDIFF func.

Might it be helpful to set up a matrix of db-engine vs. edge cases like DATEDIFF in order to suss out whether this issue exists between other engines as well?

Otherwise, it is only a minor annoyance to have to think back to my MSSQL days (no license in current stack).

Unfortunately, there is no definitive guide to compliance with the ANSI standard. I found this, but it’s not great:
http://troels.arvin.dk/db/rdbms/

Like you, I don’t have a MSSQL license. I think that will reduce our contribution of the initial SQL. We’ll get to validate the conversions to other platforms instead.

Bill

After being dormant for over a year, creating SqlRender 2.0 has now become unavoidable since CDM v5.1.0 introduces DATETIME fields, and most of our date operations like DATEDIFF fail at least on some platforms when applied to DATETIME.

I’ve set up a version 2.0 milestone to which issues will be added. The fundamental change in 2.0 will be the departure from SQL Server SQL as the starting dialect. Instead, the start will be OhdsiSql, which happens to start out exactly as SQL Server SQL so we’ll be backwards compatible. We’ll aim to make OhdsiSql ANSI SQL compatible as much as possible, except where more specificity is needed to support the translation. (Remember: unlike the SQL parser on your server, SqlRender will not know what type a field is when parsing and translating).

Some things I’m still contemplating:

  1. Replacing the current partial parsing engine with a full parser. Full parsing will give us greater flexibility, and of course will make us more formal. We can have a BNF specification of OhdsiSql drive the parser that others might build on. We can probably reuse the current translation rules, and can certainly reuse our unit tests.
  2. Adding an interactive tool for editing OhdsiSql, and translating it on the fly. This could be implementated as a Shiny app in the SqlRender package.
  3. Integration with dplyr, for people that only want to write R and not SQL. The R code could easily generate the OhdsiSql, which can subsequently be rendered to any supported dialect.

Oh, I decided against changing the name. ‘SqlRender’ is fancy enough :wink:

Looping in @shawndolley and @JasonPoovey who might also be interested in this.Let me know if you have any thoughts.

1 Like

I agree with the use of a full-fledged parser. I’m most familiar with flex/bison for building parsers, but that isn’t very portable. Interactive tools sounds good, but I also think there is use for a command line tool. We have already essentially made one of those that takes SQLRender v1.0 and translates i (simple Java application that calls the Render method). Also 100% with integration with dplyr. Selfishly I would also like python hooks, but doesn’t need to be a priority.

One other thought I’ve had which might be a good idea for building out 2.0 is to have some regression databases for SQLRender 2.0. We would need to regress not only functionality, but I think it would be valuable to measure query performance as well. I think it would be valuable to know if we are making changes that create translations that dramatically decrease performance.

Thanks Jason! Could you elaborate a bit more on the command line tool, and the way it is currently being used?

(I agree on the regression databases. I do see some issues like the dependency on the exact SQL statement that may differ from the one used in an application, and the fact that test databases tend to live in the cloud making performance inherently variable between runs. But that shouldn’t stop us from trying).

The command line tool we have now is fairly simple. It takes the SQL server version and calls translateSQL to take it to a different dialect. Basically just a wrapper around that function so we can turn it to Postgres on the command line.

I like the part about dplyr. Supporting other paradigms than SQL is a good new paradigm.

Thanks for including me. I am a huge believer in OMOP CDM and now in OHDSI and at least while either @tomwhite are at Cloudera I suspect we will want to be an up-to-date full-fledged option for platforming CDM, and if we’re gone our customers will make Cloudera. So the chance to be included in the dialogue on SQLRender is appreciated and an honor. I will ask Tom if he can respond soon.

Friends:

I am out of my league here, but I can’t believe we need to solve that problem in OHDSI. There are tons of solution if you google the problem. Are we avoiding them to save money? There is even a proper parser in Perl (again, dating myself) for doing that for free. What’s the deal?

Last time I checked there were no valid other solutions for SQL translation. Some (like SqlFairy and SwissSQL) seemed to promise what we need, but upon investigation they broke down more often than they worked. If you have an alternative that you can recommend I would be happy to take a look at it.

For a SQL parser: yes: I fully expect we’ll re-use some existing BNF-capable parser library in Java for that. But we’d still need to compile the dialect from the parsed representation.

It’s amazing to me that SQLRender has faired as well as it has. Clearly a general-purpose SQL translator (especially if it could cover some noSQL cases as well – Hadoop, GBQ?) would be a huge boon to the database world. The only reason a great solution doesn’t exist, I would assume, is because the problem is too hard. Yet SQLRender has managed to hold this community together despite our commitments to disparate DBMSs.

But one has to ask, especially in light of a planned rewrite, at what cost? And are there alternatives? I believe Patrick, Frank, and Martin, and probably others who have thought much longer than I have about this challenge, have concluded that we can’t escape our dependence on SqlRender whatever its flaws.

But I, for one, and I imagine I’m not alone, am disturbed at the thought that the tools I develop are unlikely to gain community adoption if I don’t write my SQL in MSSQL or OHDSI-SQL – 1) because I have no way of running those languages right now, and 2) because the SQL code I write tends to be very hairy and I tend to rely on every bell and whistle available in whatever flavor of SQL I happen to be using (right now PostgreSQL) and automated translation of it would likely be impossible.

I don’t have a great solution right now, but I’m curious to know how much others share my concerns as we contemplate new investment in SqlRender.

Thanks Sigfried for contributing to the discussion!

Just some thoughts on your concerns:

  1. The nice thing of creating our own language and having a full parser is that we can actually enforce the limits of what people can and can’t do at develop time. Stated more friendly: we could (in theory) guarantee that code that you run in your environment (e.g. Postgres) will run everywhere, because the OhdsiSql language will be a closed language which will be parsed by the client, not the server, and so problems that right now we can only catch at run time because the developer didn’t have access to the runtime environment can be caught at develop time.

  2. Any solution for running the same code over multiple platforms will have the problem that you can’t optimize it for one platform. Just so you know, when I originally developed SqlRender people were highly skeptical and this is the reason why you always have the option to override automatic translations with platform-specific SQL simply by placing it in the appropriate subfolder of inst/sql in your package. Although I think we should keep this feature, it is interesting to note that to date no one has used it.

But I agree, updating SqlRender will be a significant investment, and not our core business. I’m very open to alternative ideas.

t