OHDSI Home | Forums | Wiki | Github

Failing WebAPI/Atlas implementation

I’ve been trying to set up the WebAPI and Atlas on Azure following the steps defined in this repo. https://github.com/microsoft/OHDSIonAzure

With setting this up I’ve come across a lot of issues most of them I was already able to fix.

For example what I’ve done:

  1. Translate PostGres SQL scripts to SQL server
  2. Update Dockerfile since it was using the latest version.

The problem is that both the webapi are able to start up but I’m still facing 2 issues.

One is the error Column 'webapi.tag.created_by_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause (full error log below).

The second occurs once I’m doing some actions in the Atlas GUI. This gives me a BAD SQL grammar error. (full error below). This error runs on Achilles cache tables which i do not have. Is this achilles required for running Atlas + WebAPI?

Is there anyone who can help me validate my setup?

Output /WebAPI/info

{
  "version": "2.10.0",
  "buildInfo": {
    "artifactVersion": "WebAPI 2.10.0",
    "build": "NA",
    "timestamp": "Thu Mar 23 10:40:18 UTC 2023",
    "branch": "73f36c3e914e25186e39b6c170cf330525d64ff7",
    "commitId": "73f36c3",
    "atlasRepositoryInfo": {
      "milestoneId": 39,
      "releaseTag": "*"
    },
    "webapiRepositoryInfo": {
      "milestoneId": 39,
      "releaseTag": "*"
    }
  },
  "configuration": {
    "security": {
      "samlActivated": false,
      "enabled": false,
      "samlEnabled": false
    },
    "vocabulary": {
      "solrEnabled": false
    },
    "plugins": {
      "atlasgisEnabled": false
    },
    "person": {
      "viewDatesPermitted": false
    },
    "heracles": {
      "smallCellCount": "5"
    }
  }
}

Error 1

2023-03-27T14:36:34.365252976Z deploy-script stdout | 2023-03-27 14:35:22.825 WARN taskScheduler-8 org.hibernate.engine.jdbc.spi.SqlExceptionHelper -  - SQL Error: 8120, SQLState: S0001
2023-03-27T14:36:34.365257076Z 2023-03-27 14:35:22.825 ERROR taskScheduler-8 org.hibernate.engine.jdbc.spi.SqlExceptionHelper -  - Column 'webapi.tag.created_by_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Error 2

2023-03-27T15:25:12.205628122Z deploy-script stdout | 2023-03-27 15:19:29.637 ERROR taskScheduler-6 org.ohdsi.webapi.tag.TagService -  - Cannot refresh tags statistics
2023-03-27T15:25:12.205632022Z 2023-03-27 15:19:29.637 INFO taskScheduler-6 org.ohdsi.webapi.tag.TagService -  - Finishing tags statistics refreshing
2023-03-27T15:25:12.205635822Z deploy-script stdout | 2023-03-27 15:19:29.956 INFO taskScheduler-8 org.ohdsi.webapi.executionengine.service.ScriptExecutionServiceImpl -  - Invalidating execution engine based analyses
2023-03-27T15:25:12.205639823Z deploy-script stdout | 2023-03-27 15:25:12.194 ERROR http-nio-8080-exec-7 org.ohdsi.webapi.util.GenericExceptionMapper -  - org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [(SELECT
2023-03-27T15:25:12.205650523Z    aa1.analysis_name AS attribute_name,
2023-03-27T15:25:12.205654423Z    ar1.stratum_1     AS attribute_value
2023-03-27T15:25:12.205658123Z  FROM webapi.ACHILLES_analysis aa1
2023-03-27T15:25:12.205661723Z INNER JOIN
2023-03-27T15:25:12.205665423Z webapi.achilles_results ar1
2023-03-27T15:25:12.205669124Z ON aa1.analysis_id = ar1.analysis_id
2023-03-27T15:25:12.205672724Z                    WHERE aa1.analysis_id = 0
2023-03-27T15:25:12.205676424Z 
2023-03-27T15:25:12.205680024Z  UNION
2023-03-27T15:25:12.205683624Z 
2023-03-27T15:25:12.205687124Z SELECT
2023-03-27T15:25:12.205690824Z   aa1.analysis_name                AS attribute_name,
2023-03-27T15:25:12.205695024Z   cast(ar1.count_value AS VARCHAR) AS attribute_value
2023-03-27T15:25:12.205698925Z FROM webapi.ACHILLES_analysis aa1
2023-03-27T15:25:12.205702525Z INNER JOIN
2023-03-27T15:25:12.205706125Z webapi.achilles_results ar1
2023-03-27T15:25:12.205709725Z ON aa1.analysis_id = ar1.analysis_id
2023-03-27T15:25:12.205713425Z WHERE aa1.analysis_id = 1
2023-03-27T15:25:12.205717125Z )
2023-03-27T15:25:12.205721125Z ORDER BY attribute_name DESC]; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'webapi.ACHILLES_analysis'.

@Ajit_Londhe you seem to have a lot of experience reading through the other forums. Can you spot the issue?

This table is not created by WebAPI. To get it, you can see this forum post.

The other thing that is a little confusing to me is that you have a ‘webapi’ schema, and you’re looking for the achilles results in the webapi schema. but the webapi schema should be where your WebAPI tables live, while the achilles_* tables are in your ‘results’ schema. There’s several discussions describign how the WebAPI db is separate from your CDM databases, so I’d search the forums for that, but I wanted to call that out because it seems you’re combining your webapi schema and your cdm results schema.

The first error I think has been called out in this issue. I’d need a developer like @Ajit_Londhe or someone from Odysseus to look into the cause. This may be something fixed in a later release.

Thank you for your quick reply!

Can I just instantiate it as an empty table?

Well, I’m following the OHDSIOnAzure repository and there they’ve configured the webapi schema for Achilles as well. I assume I can configure which schema to select with one of these environment variables:

The GitHub issue you’ve linked is also entered by me :slight_smile:

Here’s my tables overview (adding in case this would be helpful)

Yes, this schema is mixing up webapi schema with CDM/Results schema.

WebAPI tables: batch_job_execution, cc_analysis, etc
CDM tables: concept, concept_ancestor, drug_exposure
Results tables: Everything found in this folder.

I’m not sure how this db was set up or if it’s the default db provided by the OHDSIonAzure, but I think it’s a bit confusing.

I would imagine you can call the schema anything, but normally I think it would be ohdsi_results. Once the R code is run, you can expect to see something like this:

Our omop = your dbo schema (which can be anything).

Given the composition of the achilles_analysis table, I’m not sure how far a table shell will get you.

Running the R code against SQL Server can have issues that require manual patching of at least 3 of the analyses (which fail with errors) unless they have finally been addressed. I attached the file listing so at least you can figure out where to find them (on a Mac) when an error is encountered. This is discussed in some other threads.

The datasource.ohdsi.schema and flyway.placeholders.ohdsiSchema are correct (webapi). You can see an example of a datasource.cdm.schema being cdm, which would indicate that they were thinking that there’d be a separate schema on the same DB. But that’s not the correct practice. The WebAPI db hosts 1 schema: webapi, which contains things like cohort definitions, concept set expressions, and cohort characterization/incidence rate/plp/ple designs. It also stores CDM sources which tells the applications which CDMs can be queried or perform analysis executions. It is inside the webapi.SOURCE and webapi.SOURCE_DAIMON that contains the jdbc connection and schema names of the CDM data. See the cdm configuration guide for details.

I’m not sure if OHDSIOnAzure is supposed to be pre-configured or not. If you can, you might query the webapi.source and webapi.source_daimon tables to see what is in there.

Yes, these schema names can be anything, but for purposes of ‘formal names’ for documentation purposes:

webapi: the schema hosted on a PostgreSQL instance that contains the webapi meta (definitions, batch job executions, source configuration)
cdm: the schema hosted on a wide range of DBMS platforms that hosts patient-level data
results: the schema hosted adjacent to the cdm schema which will contain analysis results based on the patient-level data contained in the cdm schema.

By ‘adjacent’ that means that we commonly perform functions as:

INSERT INTO results.some_result_table
SELECT stat1, stat2, stat3
FROM (
 ... some subquery of CDM tables found in the cdm schema
) d

I don’t think calling it webapi is a great idea–it’s confusing and agree it’s certainly better to follow convention when possible. But the missing table is a big problem.

The CDM source table is configured with one source entry. Which is the same database as the webapi is using.

I should have mentioned that I’m able to query some data in Atlas.

So Achilles tables should be in the results schema? (Is there any environment variable where I can change this behavior?)

It is best practice to have a dedicated database for the web api and a dedicated database for cdm+results?

But will this solve my issue is the question?

  • Like previously mentioned I assume it can be called anything (despite the fact it is not best practice).
  • Do I need to run Achilles to get this to work? I though this was an ‘optional’

Yes, the Achilles should be in the results schema, but that’s an assumption on the application’s part, not something configurable via environment variables.

Yes, I’d say so, because you have big-data type platforms like azure, redshift, google big query, netezza that have functions to support large-scale queries and aggregation (column stores, partitioning, distribution keys) but don’t have any of the OLTP functions that your conventional RDBMS have (sequences, foreign keys). It’s not completely invalid to host 1 database that will have a webapi schema and a cdm+results schema (maybe for demo purposes you’d do somethign like this), but those schemas should definitely be distinct…I thought i saw webapi tables in the same schema as your cdm. You should at least have separate schemas.

In the past, we didn’t require Achilles in order to function, however, since then we have things like record counts that we try to load and this comes from achilles results (but is a dependency for functions like vocabulary searching). I’ve seen cases where we have errors starting the application but they are only logged errors, they don’t prevent the startup of the application. In your environment, I think you’re showing that the app does come up, you just have errors in the logs…is that the problem you have (errors in the logs) or is there a specific feature that doesn’t work, or the entire app fails to load?

Well there’s not much to be seen. What are some basic tests I can do to verify deployment?

t