OHDSI Home | Forums | Wiki | Github

Problems getting JPA webapi branch to work

Ok, more info: the default database that flyway (or the jdbcurl) connected to was master. That’s where the tables were created, and that’s why it works. Scary because if i tried to restart the process with clean, it would have wiped the master database. (this would be a Bad Thing™).

I’m going to put into the jdbc url the path to the server name. However, this raises questions about what information should be in the CDM_SCHEMA that we’ve been using thus far (we’ve included the database.dbo as the ‘schema’). I think we need to rethink what we call ‘schema’.

-Chris

The exception in your first post indicates that the hibernate.dialect is not set. I probably need to add expicit config in the hibernatejpaadapter, because for some reason auto detection is not working for sqlserver.
How did you get the sqlserver jar in the war without having it in the pom? Did you activate your profile (I.e. does the embedded application.properties file look correct?)?
Not having drivers in the pom would require manual steps to run/develop (edit pom or configure/deploy to tomcat) and I think is unnecessary. I don’t want to have to deploy to an external tomcat every run. We are not maintaining the libraries or hosting it. I see no harm in having our supported dB’s within profiles where we specify an optional dependency.
In a meeting most of the day… But will take a closer look at the log and add support for hibernate dialect…

The issue was resolved once i added the Class.forName() (to load the class into the JVM) and then the jdbc:sqlserver jdbc url resolved to the correct driver. From there, hibernate was able to determien the dialect from the driver that was loaded. and things continued. So the question is, why wasn’t the class loaded automatically without me having to do a Class.forName() in the code?

To answer your question about how it orked withoutt he POM: netbeans (and I think eclipse can work this way) just automatically deploys the generated warfile to a configured tomcat server in the IDE. This configured tomcat server already as the JDBC drivers it needs to connect to a database via JDBC. The ide knows how to tell tomcat to reload the war during compiles, and initially it was very quick. Since springboot, tho, sometime the debugger gets confused, I have to kill the tomcat service forcfully sometimes (as in go to task manager and kill java.exe), in general has had a definite negative impact on my productivity But, the gains seem to outweigh the pains, so I’m living with it. .

I do not know why we have to have the JDBC drivers bundled in the WAR. I feel like it has something to do with running the app as a WAR or as a JAR, but I honestly don’t know why this is a requirement, and it does make matters more complicated. And i do not know what happens when 2 different wars with 2 different JDBC drivers to handle the same jdbc url gets resolved. Do you get errors like there are duplicate classes found on the classpath? Or does the first one found get loaded (possibly causing strange behavior on the second war that gets loaded that depends on the jdbc driver that it was bundled with, but the first war has now loaded it’s own JDBC driver for it…)

But this is getting off topic: at this point, I have the application running, it found the driver (after I loaded it explicitly) and it is creating the tables for me. There is a BUT: the configuration settings that specify the OHDSI and CDM schemas are making flyway thing that it needs to create a schma called [SERVER.dbo] to place the migration tables into. I’m pretty sure that this is confusion surrounding what sql server means by a schema, what we (the devs) what to think as the scheama (as in a prefex to reference a table) and what the potential differences between database platforms regarding schemas could impact our SqlRendered template sql.

On this last point, I’ll make a separate forum post.

Made post here:

Re: unable to find driver.

I think I’ve nailed down what the root cause of this was: it wasn’t that the class couldn’t be loaded or found in the class path (I removed the dependency to the JDBC driver in my local pom.xml, verified that it was not beign coppied into the WAR). It found the driver in the tomcat/lib folder but only IF I put the Class.forName call in the DataAccessConfig.java immedately after geting the driver name from the getRequredProperty call.

I should point out: in both cases where the jdbc driver was in tomcat/lib OR in a POM.xml dependency was defined (and thus the jdbc jar was coppied into the WAR), the driver was not found unless I put in a Class.forName to load it into the JVM. So I don’t want to make the argument that this is somehow related to bundled vs. non-bundled jdbc drivers. That was a false alarm on my part and should disregard.

So, question is: is there something that was introduced in this branch where we used to get the driver loaded without class.forName it but in this branch it’s not happening? Or maybe there was a class.ForName somewhere in code, but JPA is loading earlier than that and we just needed to class.ForName earlier in the code execution…

-Chris

Yes, the change to not pool connections introduced a basic DriverManagerDataSource that was not perform the loading of the class. I have pushed the minor adjustment (which basically is just what you mentioned - explicitly loading the class).
Interesting that I was not seeing this due to the way I was running the app (as a java application vs deploying to tomcat). From a quick search it may be Eclipse7(Jdbc4) automatically loading the driver classes found on classpath… I would like to note that running as a java application (WebApi class), takes about half the amount of time to start up (~7 sec vs ~15 secs on Eclipse. Could be even longer with Netbeans if it is truly packaging the war each time).
Sorry I missed this… Another (one-time) drawback of not being able to use standard pooling…

So, what did you have for datasource.ohdsi.schema vs datasource.cdm.schema? I noticed this line in the stack trace had this message that I did not see in oracle. Did it write tables to the schema in datasource.ohdsi.schema? Was this issue that the url for the “ohdsi/results” schema was different than the cdm and the default was not changed?
Flyway “clean” is not going to occur w/o some changes… so I think we are fairly “safe”, though I agree that “clean” in general is a Bad Thing for production systems. You may note that I added “rollback” scripts to so that manually, you could revert changes made by the migrations… Something for us to consider… I’ll try to take a closer look at your thread about schemas…

Thanks for looking into it, Alex. Funny thing: microsoft site just realsed a jdbc 4.1 version of the driver that calls out that you don’t need to class for name because the jdbc4 version says it will handle loading the classes, but NOOooooooo still doesn’t work. I don’t think there’s too much of an overhead to load the class explicity, and this will allow some backwards compatability to non jdbc4 compliant drivers.

I’ve added some entries to the pom.xml on my local commits, would you mind if I checked them into this branch to be included with the pull request?

Re: setting the schema.

So in my config settings, i have my OHDSI_schema set to {servername}.dbo. So, what flyaway does is attempt to create a schema called {servername.dbo} which is not what we want here since servername represents a different database instance.

This is just a fundamental issue of how we were using the term ‘schema’ in our custom SQL. Main discussion on this is int he other thread I created.

Go ahead and commit/push whatever Chris. Yes, Flyway does try to create the schema if it doesn’t find it (note that on oracle the schema name flyway.schemas is case sensitive). So, did you get flyway to work as expected when putting in the respective sql server ohdsi/results schema and correct flyway.datasource.url/user/pass?

Yes, once i changed the pom.xml properites to have the cdm and ohdsi scemas set to ‘dbo’ it created the tables in the schema ‘dbo’ as expected. This is all fine.

The main problem that remails is that if the scema we’re putting itno our config files is a ‘true schema’ then the sql that I’ve written so far won’t work because the tokens that are in the config file specify the server.schema.table format, which I’m not sure if that what other environments are planning to do (i don’t think anyone indicated it here or in the other thhread i started about db schemas).

So, to move this branch forward, what I can do is make mdoifications to my cohrot expression logic to sue different parameters for ‘cdmTablePrefix’ and ‘resultTablePrefix’ that is constructed in the config file to either the actual schema or a combination of the {server name.schema}. I’ll push it up, and if you get it and it continues to work for you, then that’ll be great.

A point i’d like to make is that while we can associate differnt databse connections to different mssql server database instances, our cohort generation queries will be working across database instances within the same connection, by referring to the external database by a server name int he query. (i’m not sure if this is the case in the hermes sql scripts). What the original plan was to just have a single connection to the mssql instance, (defaulting to master) and have all of our template sql queries fully qualify the table to query from by server + schema. Now, since we’ll have schema properties that point directly to schemas, i’ll change my jdbc URL to point to a specific db instance that the main connection connects to, and any queries I write will have to have a paramter passed in to indicate the cdm table prefix.

In environments where both the CDM schema and CDM results are in the same database instance but separated by a schema, the only thing that should need to be differetn is how the CDM table prefix is defined. This will have an impact on what’s in the AbstractDAOservice where we’ve defined fields to store the different schemas. I just need to add a property for CDMTablePrefix. or some such property name that’s used to fully qualify the cdm table.

I’ll work on this next and push up the changes.

-Chris

Just to clarify, I’m under the impression that what we currently have is a qualified name of ‘database.schema.table’ not ‘server.schema.table’. Originally we went along with the variable name of @CDM_schema and just put our database.schema (aka cdm_optum.dbo for example) because it worked and we didn’t really need to differentiate between database and schema. Recently I found that the JDBC getTables method really wouldn’t work the way I expected unless I had the actual database and schema names available as separate variables so I backed off of using it as a check before running. It sounds like not having the database and schema clearly defined is causing issues in other places as well.

Should we take the step now to include a database variable and a schema variable separately in the configuration?

Yes. That’s what we need. a CDM_database and CDM_schema property, and possibly an OHDSI_database and OHDSI_schema property as well, but the OHDSI_database property may be unnecessary because we can make the initial JDBC connection string specify the database to connect to.

When using these fields in SQLRender, we should probably define tokens in our template table such as @cdmTablePrefix and @ohdsiTablePrefix because in some environments (where you don’t specify a database or the query should assume the ‘local’ connection’s database) you’d just put CDM_schema as the replacement, but in other cases (such as in our environment where there are separate databases) you’d want to concat the CDM_database with CDM_schema separated by a ‘.’. if we just always concat cdm_database with cdm_schema then you wind up with table prefixes that look like [.dbo] and I think that would fail. We could always insist that you specify CDM_database in your config and we always make the table prefix [cdm_database.cdm_schema], but I don’t think that would work on non ms sql environments.

-Chris

Forgive my ignorance, but aren’t you talking about the same convention that @schuemie had established in an earlier thread a few weeks ago: Getting rid of .dbo. in our parameterized SQL

Sortof. @schumie is declaring a combo variable called cdm_database_schema, and to find the database you split it on the ‘.’. In our case, we’re suggesting to have 2 distinct variables for database and schema, and when replacing a token in the SQL statement, you decide if you need to concat database + ‘.’ + schema or just use schema alone (in the event that database is not specified). I don’t know how a @cdm_database_schema var looks like in oracle because I do not know if oracle has the same {database}.{schema}.{table} dot notation.

-Chris

Pardon me if I caused any confusion. For purposes of what needs to be done in our template SQL, we could use @schuemie approach and define a @cdm_database_schema that we use in our template SQL that is defined with the platform-specfiic table prefix to access CDM tables. In oracle that might just be CDM_TRUVEN_CCAE, in mssql it would be CDM_TRUVEN_CCAE.dbo. For purposes of determining the cdmdatabase, you could split either of these strings and get the ‘database’ for use in the @cdm_database paramater in templateSQL.

From a WebAPI perspective, i think we would need to manage separate webAPI properties that are seemingly similar to the templateSQL params, but serve a different purpose. Flyway has an option for schema but in mssql I don’t think you’d need to specify this at all (since it would default to the schema for the user which in most cases is ‘dbo’). So the question remains on the webAPI side, what is the set of properties that we need to maintain to confgure webAPI against a particular database (jdbcurl, user,pass, schema? database?) and how we might need to store these values separately fromt he values needed by the templateSQL.

Just to put it out there. I just got back to looking at WebAPI after a few days away. Flyway didn’t work for me because the checksums didn’t match. Apparently, it takes line endings into account (see here and here). Since I’m on OSX and Alex is on Windows, that was causing the problem. I tried to follow the *nix instructions here by switching to ‘input’ but that didn’t work either. I had to switch to ‘true’ which seems to be the Windows setting. Now, it’s finally working.

Just a heads up for the community since I assume people may be using all different OS.

We may want to introduce a .gitattributes file that specifies true if we cannot find any problems with that setting cross platforms.

I read on some of the discussion on this: doing things through git means that the file has to pass through git to be processed. If you are developing migrations on your local workstations and you migrate them before committing them, you’re already screwed (because your native CRLF has now polluted the checksum).

They say they were working on something to address this for the 4.0 release, sometime in the spring.

-Chris

See following pull request comment for details on what to do after pulling down .gitattributes file. This should fix the checksum problem, ensuring that *.sql files have LF line endings in both repository and working directory:

t