Offline thread on testing, how much, how to do it, and Apache Impala

shawndolley · December 30, 2016, 5:27pm

This is a back and forth that happened on email. Martijn S.asked ‘why was this not on the forum’. So I am adding this. These are pasted below in time order. If the email forks, the second fork is added after the end of the first one.

From:
Shawn Dolley [mailto:sdolley@cloudera.com]

Sent: Wednesday, December 21, 2016 10:42

To: mark@outins.com; Kirill.Eitvid@odysseusinc.com; evans@ohdsi.org; Vojtech.huser@nih.gov; derek@cloudera.com; Gregory.Klebanov@odysseusinc.com; cknoll1@its.jnj.com; alpivonka@gmail.com; tom@cloudera.com; rstarr7@gatech.edu;
Frank Defalco; Patrick Ryan; Christian Reich

Subject: request for info, topic for presentation (joke included)

So two new guys walk in to OHDSI Bar.

Bartender to one of them who is carrying a box: “Whatdya got
there?”

Guy 1: “Hadoop Code”

Bartender: “Well put it up on the bar”

Guy 1 puts it on the bar.

Guy 1: “Well I wouldn’t serve it yet, it’s not tested”

Bartender: “That’s right, it’s not”

Guy 1: “Well it’s got to be tested”

Bartender and everyone in the bar (except for the 2 guys)
breaks out laughing.

OK, so seriously though. I am wondering if someone in
the community who is on this email thread and has been here a while could
present to the Hadoop WG (and I curated the email list above for the people who
were tech-y and seemed to have been here a while) on the topic of ‘How to test
code in OHDSI’ since we have new Impala schema code committed and Tom has the
new SQLRender that can produce SQL that works on Impala (not sure what you call
that) and Tom tells me it needs to be tested and I don’t know what a community
organizer (like Obama?) would do to get this set up, not sure if it is hardware
or what it is….

Questions I am hoping can be answered

How does OHDSI know if enough testing has occurred to feel good about
something or that it is ‘in’?

How many sites does code need to run on to be ‘tested enough’?

How many data source CDM’s does code need to run on to be ‘tested
enough’?

Are we testing for performance? That a query comes back with the right
info?

Do I need to convince people to stand up Hadoop/Impala and run some
synpuf queries? Do people do this just for fun? How many?

Are there any documents on this?

Are there any test routines that have been written for other platforms?

What did Postgres have to do to become certified (I think years ago it
was only Oracle and SQL Server, so Postgres was added at some point?)

Specific to the code that Tom has out there, is the answer different for
that, i.e. are there specific and obvious next steps in this specific situation
that make all previous questions moot?

Right now, I have a sense that Tom is – rightly – in a bit
of a waiting mode or there’s less to be done until we get this tested (even if
it is Cloudera who has to test it outside of Tom and we don’t have a non-Tom
OMOP CDM cluster set up) so I want to get it tested…

Anyway, not sure if the right thing is education for Hadoop
WG but I think if we shared what needs to be done with the folks who have stood
this up in the group maybe they’d be more willing to test it.

Thanks

Shawn

Shawn Dolley

Industry Leader, Health & Life Science

Cloudera

sdolley@cloudera.com

202-460-4660

@shawndolley

THEN FROM CHRISTIAN:

Shawn:

That is one geeky and corny joke indeed!!! This crowd needs
those. JJ

Bottom line: We don’t have a good set of standards. We also
have no standardized environments where these tests could happen. This is
because all code is delivered as a gift to the Public Domain. So, it is up to
each contributor to provide quality code. If the quality is not there, the only
consequence really is lost reputation in the community.

We had such discussions earlier with the team, but the
combination of an exciting scientific agenda and the absence of funds to
establish proper software development methodology conspired against us doing
anything central. The tools we put online are often debugged by Lee Evans.

However, you may want to talk to the commercial vendors in
the community who might want to do something about it. The usual suspects are
Odysseus, Outcomes Insights, Ephir, Evidera. Even QuintilesIMS is interested in
the technology, as you know, though we wouldn’t be able to test non-IMS datasets.

Let me know if that helps.

Best,

C

FROM VOJTECH

As far as I know - Martijn
(added to CC) is the author of the SQLRender - so he may have some ideas.

There is some convention on how
to best author master SQL code (to be translated to other dialects).

E.g., don’t use ‘having’, name
all subqueries, etc. (insert into plus with X pattern)

These are partially
documented. (not sure where exactly)

New guideline for explicit
casting in all cases would have to be added. (to accommodate Impala dialect)

Possible tests are:

Running
Achilles (existing TravisCI process for Achilles does that (and I use it
a lot). https://travis-ci.org/OHDSI/Achilles/

Running one past
study (e.g., pathways study – the pathway study was most raw I guess and does
not have too many libraries use)

Essentially - existing queries
built into R packages would have to be tested if they need casting (if they
work on Impala). If they don’t – than they will work.

You mentioned postres…

There is also precedence for a
failed attempt to add a dialect (Netezza) to SQLRender. So opposite
examples exist. L

Postgress was kind of there from
the start (I think; or very close to RedShift; longer story there).

Vojtech

THEN FROM CHRIS KNOLL

Hi, Shawn,

Christian is correct that we
don’t have a good set of standards, but to be fair, we have a very diverse set
of environments that we’d want to write test-cases on that makes a single
platform to standardize on difficult. For example: we have Java libraries and R
packages that utilize TestThat to define test cases, and this works well for
the types of tests where you call a function and test for an expected result
(such as with simple unit tests). For SQLRender, I think Martijn has led
a great example of how to set this up, and even has gone as far as setting up
the travis CI environments to execute these tests and report them in the GitHub
repo. For doing work in SQLRender, I think this is the model you should
follow since Sql render doesn’t need a database to connect to, you can just
create a set of test cases to validate that your translation rules are working.
Here’s an excellent example of how it looks for testing SQLRender:

test_that(“translateSQL sql server -> Oracle DATEDIFF”, { sql <- translateSql(“SELECT DATEDIFF(dd,drug_era_start_date,drug_era_end_date) FROM drug_era;”, sourceDialect = “sql server”, targetDialect = “oracle”)$sql expect_equal(sql, “SELECT (drug_era_end_date - drug_era_start_date) FROM drug_era ;”)})

So this is a mode of testing
where you have a known-input-known-output test.

For DatabaseConnector, Martijn
has also set up a nice test suite there that also uses TestThat. Here’s
an example of a test:

test_that(“Open and close connection”, { # Postgresql details <- createConnectionDetails(dbms = “postgresql”, user = Sys.getenv(“CDM5_POSTGRESQL_USER”), password = URLdecode(Sys.getenv(“CDM5_POSTGRESQL_PASSWORD”)), server = Sys.getenv(“CDM5_POSTGRESQL_SERVER”), schema = Sys.getenv(“CDM5_POSTGRESQL_CDM_SCHEMA”)) connection <- connect(details) expect_true(inherits(connection, “JDBCConnection”)) expect_true(DBI::dbDisconnect(connection)) # SQL Server details <- createConnectionDetails(dbms = “sql server”, user = Sys.getenv(“CDM5_SQL_SERVER_USER”), password = URLdecode(Sys.getenv(“CDM5_SQL_SERVER_PASSWORD”)), server = Sys.getenv(“CDM5_SQL_SERVER_SERVER”), schema = Sys.getenv(“CDM5_SQL_SERVER_CDM_SCHEMA”)) connection <- connect(details) expect_true(inherits(connection, “JDBCConnection”)) expect_true(DBI::dbDisconnect(connection)) # Oracle details <- createConnectionDetails(dbms = “oracle”, user = Sys.getenv(“CDM5_ORACLE_USER”), password = URLdecode(Sys.getenv(“CDM5_ORACLE_PASSWORD”)), server = Sys.getenv(“CDM5_ORACLE_SERVER”), schema = Sys.getenv(“CDM5_ORACLE_CDM_SCHEMA”)) connection <- connect(details) expect_true(inherits(connection, “JDBCConnection”)) expect_true(DBI::dbDisconnect(connection))})

This is an example of a larger
test, but also it’s different because it’s not a case of
known-input-known-output type of test. Instead, a database environment needs to
be set up (and settings passed in via environment variables) and instead of
sending something in and checking the output, we simply are taking some action
that we hope does not result in an error. And if no errors, and the results of
the calls are a particular type (using the inherits() function) then we
consider the test passed.

We don’t currently have an
official standard creating test cases for WebAPI (our rest service layer) nor
our html/javascript apps. This is something we’d like to try to
incorporate, but haven’t gotten a lot of traction.

So, that all being said, I’ll
try to answer your questions:

```
  How
```

does OHDSI know if enough testing has occurred to feel good about something or
that it is ‘in’?

If you can produce tests that demonstrate the code works (such as test cases
per SQLRender and DBConnector) then that would work. In the cases where
we don’t have an official test strategy, then it usually involves someone
pulling down your branch, trying it out locally, doing their own test and
blessing it if it looks good. Admittedly, this is very unscientific, but
we’ve been able to incorporate many community contributions in this mode.
The difficulty for you specifically with Impala, is that it’s a hurdle to set
up all the infrastructure in order to test if an Impala environment is working
and is robust enough to cover all the necessary tests (such as if you want to
incorporate some PR for Achilles on Impala, someone would have to be able to
verify that the code works properly).

How many sites does code need to run on to be ‘tested enough’?

I’d say at least 2.

How many data source CDM’s does code need to run on to be ‘tested
enough’?

In our internal environment we have several
different flavors of CDMs from different sources (EHRs, surveys, etc) so by
running Achilles (for example) across all those types of CDMs, we feel
confident that it’ll work across any cdm in the network. Of course, this
depends on that SQLRender is properly working on translating queries correctly
into the different dialect. The attitude I have is that if we can show
SQL Render working across a given platform then the source queries used in
Achilles will work across the different dialects properly. There’s been
some rare cases where this has not held, but it was something that should have
been identified at the SQLRender level and not the Achilles level. I’ll
also add, that our current tests on achilles are more of a ‘no errors =
working’ vs. ensuring that the generated statistics are correct. This is
certainly something that could be improved, but would involve quite a bit of
effort.

Are we testing for performance? That a query comes back with the right
info?

We have timeouts set on our CI server where if a
set of tests don’t finish in the required amount of time, the tests fail. This
spurs us to either check query performance or verify appropriate indexes.
But as I said above, we don’t have result verification tests for queries, only
that they are well-formed.

Do I need to convince people to stand up Hadoop/Impala and run some
synpuf queries? Do people do this just for fun? How many?

You’ll probably need to convince people to stand
up Hadoop, but you have a working group which should have plenty of people with
the capability to do it. I’d recommend that when you want to submit
something to the OHDSI repo, you get 2 people there to independently verify the
working order of the code, and if it’s good enough for the working group, I’d
say it would be good enough to be accepted in the ohdsi repo.

Are there any documents on this?

I don’t think so, but again this is due to that I
don’t think there’s a single approach to testing.

Are there any test routines that have been written for other platforms?

Yes, you can check out the tests folder under the root of the
DatabaseConnector and SQLRender and Achilles repo for examples on how they are
set up. You can read up on TestThat on google to find instructions on how it
works, but it is like your typical NUnit or Junit testing framework.

What did Postgres have to do to become certified (I think years ago it
was only Oracle and SQL Server, so Postgres was added at some point?)

For SQL Render, martijn added the postgreSQL
translate tests. DBConnector was also set up to connect to a postgresql
database for the TestThat tests. For Achilles, we found people in the
community that had their data on Postgresql and asked them to run it, but
later we got synpuf installed on an internal PostgreSQL db and verified
it that way. Just to restate: the tests that check the queries, we’re
just looking for errors in our tests, not correct results. To get that
level of testing we’d have to produce a dataset with all the required
information and then get a validated result set produced to compare the
application SQL with the expected results. This is all possible, but very time
consuming.

Specific to the code that Tom has out there, is the answer different for
that, i.e. are there specific and obvious next steps in this specific situation
that make all previous questions moot?

Martijn has a comment open about the stuff put into DBConnector: it looks
like more than just the JDBC drivers were put into that PR, and so he’s
inclined to reject it if there’s extra stuff in the PR that isn’t actually
needed to make the DB Connectivity work (specifically, I think there was some
Logging framework libraries involved, and the question is does the JDBC driver
need a logging framework to connect to a db?). For the Impala SQLRender,
I’d copy the tests that we’ve created for Postgresql and recreate them using
the Impala dialect. If you have a SQL render that passes all the Impala
tests, then It’ll certainly get accepted into the repo. Part of that
validation is doing a fairly large scale test on a real datasource to check for
errors, but my expectation is that if Impala passes all the sqlrender test
cases, then it should also execute properly on something like Achilles.

I’d
like to point out one thing that happened with Achilles on Impala: the added
workaround that removes ‘delete from table’ statements from the Achilles
execution based on a flag. While I understand why it’s necessary for Impala,
the problem I have is that this limitation with Impala bleeds over into the
other platforms (because the other platforms can also disable deletes but that
would certainly cause some things to fail). So I’m a little nervous that we had
to change some core functionality in Achilles to handle a limitation in one
platform. In the past (for say MySql) we ran into certain limitations
that disqualified it from being a supported platform. I’m not suggesting
that Impala should be disqualified. I am saying that what’s being done
for Impala in Achilles makes me a little nervous. I’d much rather ask Tom to
find a solution for the ‘delete from table’ limitation rather than writing all
of our OHDSI tools to allow users to ‘turn off deletes.’

Hope
this helps.

-Chris

ALSO FROM CHRIS KNOLL

As a follow up RE: the delete
from table on Impala:

I’d much rather see a solution
where Delete statements are translated out of execution vs. having to alter the
application logic to provide a flag to turn deletes off. So a test for this
would look like:

test_that(“translateSQL sql server -> Oracle DATEDIFF”, { sql <- translateSql(“DELETE FROM some_table;”, sourceDialect = “sql server”, targetDialect = “impala”)$sql expect_equal(sql, "/* DELETE FROM some table; */”)})

So the handling of the delete
statements in SQL Render is to surround it in a comment. Would that work?
Don’t know, but this way it’s handled in the translation layer and not added as
an application setting.

-Chris

FROM MARTIJN

Two things:

What is the problem
with deletes on Impala? I agree with Chris that modifying the tools (of which
Achilles is just one) should be the absolutely last option to consider. Maybe
relevant: in the early days RedShift didn’t support ‘DROP TABLE IF EXISTS`.
Rather than using fancy workarounds, I hacked some code in DatabaseConnector in
the executeSql command that for RedShift checked the SQL for occurrences of
this statement, and programmatically implemented the equivalent of it in R
instead of executing it on the server. Now RedShift does support it so I
removed the hack. We might do something similar here although I don’t yet
understand the problem. It is not pretty, but I prefer it over the alternative.

When starting with
OHDSI we selected SQL Server, Oracle, and PostgreSQL as the platforms to
support. (PDW and RedShift got added later more or less organically). For the
first versions of Achilles I ran extensive tests on all three platforms, and
even loaded the same dataset on all three (I had installed Oracle and
PostgreSQL locally) and checked whether the output was the same. The code for
checking still exists, although setting up the same data on three platforms was
a big pain. The code is here, but should really be in the extras folder: https://github.com/OHDSI/Achilles/blob/master/R/TestAchillesCode.R

Cheers,

Martijn

ALSO FROM MARTIJN

Netezza failed because whoever
was pushing it stopped pushing it. I don’t think there were any issues
preventing it, but if nobody works on it nothing happens.

There are guidelines on how to
write translatable SQL, and they are in the SqlRender vignette: https://raw.githubusercontent.com/OHDSI/SqlRender/master/inst/doc/UsingSqlRender.pdf

Explicit casts are already part
of the recommendation (see section 3.4).

Cheers,

Martijn

No further pieces of the thread. If you have comments, please add them
directly into the forum.

tomwhite · January 5, 2017, 4:05pm

Thanks for all the testing suggestions!

For the DELETE FROM issue in Impala, I’ve now changed SqlRender to comment out these statements, as suggested by Chris. This seems to work well. PR at https://github.com/OHDSI/SqlRender/pull/43.

Cheers,
Tom