OHDSI Home | Forums | Wiki | Github

WebAPI service table dependencies

With the release of the CDMResults service in the WebAPI a new issue has emerged.

You will see issues with a loading indicator in the public version of Hermes. The underlying cause of this issue is that Hermes is trying to look up Achilles results for a concept when you view it. The database that the public WebAPI is configured to connect to does not have an achilles_results table. A SqlException is thrown, not caught, and the WebAPI returns a 404, which in turn leads to the ajax call from the client never hitting a ‘success’ block and the client fails to update properly.

The broader discussion here is: how do we ensure that tables that are required by the WebAPI are available before we attempt to use them?

My thoughts on solutions:

  1. If a table does not exist it is created by the WebAPI. This requires that the credentials used by the WebAPI have permissions to create tables in the databases and that the DDL for all tables is included as resources to the WebAPI.

  2. Simply catch the exception that occurs and return an empty object, MonthlyPrevalence in this case, allowing the client to continue on as if no data were found without knowledge that the table does not exist.

  3. A StatusService is programmed to check on the availability of required tables and return a status object describing the current state of the tables required by WebAPI. Clients can use this StatusService to ensure the required tables are available before attempt to call features that might not be supported.

  4. AbstractDaoService is modified to include a static array of strings which identify tables it depends on for proper function. Each implementation of an AbstractDaoService overrides the array to contain the proper set of tables for its purpose. A StatusService uses reflection to identify all classes that extend AbstractDaoService and check for the existence of those tables. Clients can use this StatusService to ensure the required tables are available before attempt to call features that might not be supported.

I’m interested in hearing everyone’s opinion on these solutions but also other recommended solutions.

Here is where I attempt to pull @alfranke , @cahilton , and @Chris_Knoll into this conversation to solicit your opinions before I go and add something to the AbstractDaoService class that you all don’t like. Last chance. :smile:

@Frank - Can you not handle exceptions in the application and display an error message (e.g. 404 NOT FOUND)?

I am in progress of rolling out support for Flyway (at least a feature branch for review), as mentioned in Managing Database Migrations
With this, there will be .sql files in WebAPI that applications will need to add their respective DDL to. For example, I will add the spring batch tables to V1.0.0__spring-batch-ddl.sql. Another example would be V1.0.0__heracles-ddl.sql.
Then upon initial startup, Flyway will determine that the DB has not yet been versioned, and will create the schema_version table and then apply the V1.0.0__* scripts.
It will take some discipline to ensure we add all database change in this way, but that is what I thought the direction was (i.e. to defer this sort of DB checking to our migration management library). I think we should look at showing error messages in the client…

Yes, that would be option 2. Going back to one of the other conversations we had, I don’t want to return a 404 because that is what we get for malformed routes. I would either return an empty MonthlyPrevalence or a 205.

Flyway will definitely help with the issue but I’m not sure it would alleviate it completely. For example, I don’t think we would want Hermes to go out and create Heracles tables, but it is possible that Hermes would want to look at data generated by Heracles. If Heracles hasn’t initialized itself in that environment then the tables wouldn’t be available and we would need to ensure we catch any sql exceptions where the table we are trying to query doesn’t exist.

It feels awkward to wrap all of our queries in the WebAPI to catch exceptions to handle the case where the table doesn’t exist. For that reason I was suggesting that we put logic into the AbstractDaoService class that can be leveraged.

By adding a getRequiredTables() method that returns String[] requiredTables and a isRequiredAvailable() method we would be able to have blocks like

  if (isRequiredAvailable()) {
    // do stuff
  }
  else {
    // return 205, empty object, etc...
  }

We could use the getTables() method from DatabaseMetaData to identify what tables are in the environment and then check to look for the required tables.

This is certainly a lot more complex than just catching the exception. I’m happy to just handle the exception. Just wanted to put an alternative out there to get some feedback.

I kinda like what Frank is proposing. It does add some complexity, but it’s a more informed way of handling the exception.

FWIW, Heracles will have the same problem as soon as I merge my changes into master. Where I assume that most people don’t have all the Heracles tables set up in all their environments.

I committed an implementation of what I’m describing but confined just to the CDMResultsService if you want to check it out. Appears to work for me I was able to test across environments with and without the tables to get the proper behavior.

I think there’s going to be a combination of techniques here. After internal conversations today, I’m feeling like there’s different types of tables that need to be handled different ways.

One type of table is a table that is created externally to OHDSI apps (apps which I’m putting under the umberlla of WebAPI but someone will correct me if I am wrong), and doesn’t have a material impact on the functionality of the WebAPI (as in, these tables can be safely ignored if they do not exist). I’d say that tables like Achilles_[analysis/results/heel] fit in to this category. Except of course, if your application is actually Achilles, at which point if the tables aren’t there, you don’t have much of an application…

Another class of tables are tables which aren’t maintained by WebAPI but are critical: Vocabulary tables, for example. Can’t create Concept Sets without a vocabulary. Can’t find related concepts without a vocabulary. You’re kinda stuck without a vocabulary. Can’t run any cohort methods without the CDM tables…or do any analysis for that matter…

Yet another type of table is a table which supports core functions in WebAPI, I’ll call a ‘framework’ table. These tables might be the migration support tables, or the Spring Batch tables to facilitate our async processes.

And finally, application service tables. These are the tables that were created to support some functionality introduced in WebAPI. We were talking about saving Concept Sets somewhere for reuse. That’s gotta go somewhere. Cohort Definitions initially had a home in the CDM Schema, but I don’t know about you, but between database refreshes, we get a brand-spanking new CDM database, with new cohort tables. My poor cohort definitions! Of course, we spoke about saving those results into OHDSI_Results schema, but is the ohdsi_results schema paired with a CDM schema such that whent he CDM Schema goes, the OHDSI schema does too? Well, this is a question for another thread!!

Ok, so after laying out those table types, do they all get treated equally? Can some tables be ignored if not exists? Can some be incorporated into a migration, while others depend on an external process to be born? I don’t really have a good answer for this, but i’m sure, one way or another, we’re going to need to address each of them in turn.

-Chris

I think Chris does a good job of laying out different scenarios and they will all come up within the OHDSI framework of applications. There are going to be cases where people are using the cohort method from R and have no interest in the web applications. In this case it wouldn’t make sense to tell them to install WebAPI to create the necessary tables. I’m confident we’ll be able to come up with a thoughtful solution to handle the scenarios Chris has laid out. I’m interested in seeing how things go with Flyway. Hopefully it gives us some insight.

Chris,

I think you’ve laid this out perfectly.

App Tables - specific to OHDSI apps but not required by the thing as a whole. Proposal: App init should check for presence and create.

Framework Tables - tables needed to make the OHDSI WebAPI ecosystem work as a whole. Proposal: WebAPI init should create.

?Application Service tables – it seems we might be able to treat these the same as App Tables. They are potentially cross-app, like concept_sets, but it would seem that any app that needs them could check for their existence and create. But I can see your point of differece-- these Service tables should not be wiped out when you refresh your CDM, as opposed to an app table like ACHILLES_RESULTS.

@Frank, re: the R using, non-WepAPI using user, I’d like to think we’ll get to the point where the easiest way to run OHDSI is an integrated docker environment including R and our web ecosystem. But I’ve been accused of being overly optimistic :smile:

Jon

I don’t know enough about Docker but I don’t presume it to be a silver bullet.

I agree with the proposed breakout of the categories of tables that are needed. Do we need to also somehow distinguish tables that hold specific study results/stats/analysis data that need to be retained? Those tables may need to be managed by an administrator maybe using additional functionality added to the Olympus tool for data exports/imports/migration/backups?

Somewhat related is the question of support for apps deployed as single-user, multi-user and ultimately multi-tenant.

Apps may need to manage data separately for an individual user, or a group/team of users or shared by everyone (and various combinations). How are we going to handle that? e.g. One schema per user/group/team? Shared schema tables with a user/group/team id column? Separate copies of databases? Separate instances of the apps running in VMs/containers or installed on individual user machines?

A good example use case may be cohort definitions. OHDSI could define some universally agreed definitions in the CDM cohort table for the whole OHDSI network. An organization could additionally define their own standard cohort definitions shared across their organization. A department or study team could define cohorts for a particular study and each individual researcher could define his/her own personal set of preferred cohorts of interest.

I don’t see why ‘App tables’, ‘Framework Tables’, or ‘Application service tables’ need to be treated differently. The way I think about it is, if an application X has a dependency on WebAPI (e.g. needs a database table to support the app), then there would be a specific version of the WebAPI that supports app X. In that WebAPI version, there would be an app x DDL/DML file to instantiate the required db objects.
It seems straight forward if we say that CDM schema is a dependency (prerequisite) of WebAPI and that there should only be read access to that schema. WebAPI then “manages” the ohdsi/results schema. The WebAPI (via Flyway for example) will apply the db migration scripts upon startup, to the ohdsi/results schema, eliminating the need to do “table checks” (though not to say we can’t have both).
I guess the missing piece for me is bounds around what R can do. Can R apps bypass WebAPI and write to the database? If so, then not sure how we can manage…

t