Atlas characterisation failing to run. Atlas version 2.10.1 with WebAPI 2.10.1

dandedman · September 6, 2022, 3:18pm

I am getting an error when trying to execute a new Characterisation as part of the EHDEN pharmacovigilance studyathon. The same characterisations run OK on other databases, so I wonder if it is related to the out Atlas configuration. This is my first post so apologies if I haven’t provided all the required information.

This is the error message generated by Atlas:

org.ohdsi.webapi.exception.AtlasException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [INSERT INTO TEMP.rf9lhgvr_cov_ref (
covariate_id,
covariate_name,
analysis_id,
concept_id
)
SELECT covariate_id,
CAST(CONCAT('gender = ', CASE WHEN concept_name IS NULL THEN ‘Unknown concept’ ELSE concept_name END) AS VARCHAR(512)) AS covariate_name,
1 AS analysis_id,
CAST((covariate_id - 1) / 1000 AS INT) AS concept_id
FROM (
SELECT DISTINCT covariate_id
FROM TEMP.rf9lhgvr_cov_2
) t1
LEFT JOIN CDM.concept
ON concept_id = CAST((covariate_id - 1) / 1000 AS INT); INSERT INTO TEMP.rf9lhgvr_analysis_ref (
analysis_id,
analysis_name,
domain_id,

        start_day,
        end_day,

        is_binary,
        missing_means_zero
        )

SELECT 1 AS analysis_id,
CAST(‘DemographicsGender’ AS VARCHAR(512)) AS analysis_name,
CAST(‘Demographics’ AS VARCHAR(20)) AS domain_id,

        CAST(NULL AS INT) AS start_day,
        CAST(NULL AS INT) AS end_day,

        CAST('Y' AS VARCHAR(1)) AS is_binary,
        CAST(NULL AS VARCHAR(1)) AS missing_means_zero; SELECT CAST(FLOOR((YEAR(cohort_start_date) - year_of_birth) / 5) * 1000 + 3 AS BIGINT) AS covariate_id,


        cohort_definition_id,
        COUNT(*) AS sum_value

INTO TEMP.rf9lhgvr_cov_3
FROM TEMP.temp_cohort_rf9lhgvr cohort
INNER JOIN CDM.person
ON cohort.subject_id = person.person_id

        WHERE cohort.cohort_definition_id IN (35)

GROUP BY cohort_definition_id,
FLOOR((YEAR(cohort_start_date) - year_of_birth) / 5)

; INSERT INTO TEMP.rf9lhgvr_cov_ref (
covariate_id,
covariate_name,
analysis_id,
concept_id
)
SELECT covariate_id,
CAST(CONCAT (
'age group: ‘,
RIGHT(CONCAT(’ ', CAST(CAST(5 * (covariate_id - 3) / 1000 AS INTEGER) AS VARCHAR)), 3),
’ - ‘,
RIGHT(CONCAT(’ ', CAST((CAST(5 * (covariate_id - 3) / 1000 AS INTEGER)) + 4 AS VARCHAR)), 3)
) AS VARCHAR(512)) AS covariate_name,
3 AS analysis_id,
0 AS concept_id
FROM (
SELECT DISTINCT covariate_id
FROM TEMP.rf9lhgvr_cov_3
) t1; INSERT INTO TEMP.rf9lhgvr_analysis_ref (
analysis_id,
analysis_name,
domain_id,

        start_day,
        end_day,

        is_binary,
        missing_means_zero
        )

SELECT 3 AS analysis_id,
CAST(‘DemographicsAgeGroup’ AS VARCHAR(512)) AS analysis_name,
CAST(‘Demographics’ AS VARCHAR(20)) AS domain_id,

        CAST(NULL AS INT) AS start_day,
        CAST(NULL AS INT) AS end_day,

        CAST('Y' AS VARCHAR(1)) AS is_binary,
        CAST(NULL AS VARCHAR(1)) AS missing_means_zero; IF OBJECT_ID('TEMP.rf9lhgvr_dem_age_data', 'U'

Chris_Knoll · September 6, 2022, 3:33pm

I’d suggest you look at your WebAPI logs, you will find the complete sql statement (your post only includes the partial query). Take that sql statement and attempt to execute it on your CDM, and it should give you insight as to what the problem is. It’s possible you have the wrong dialect specified in the SOURCE table such that maybe it’s translating to the wrong dialect.

dandedman · September 7, 2022, 1:04pm

Thanks Chris

We are running and SQL server - and the TEMP table syntax here is not what I would expect from T-SQL - and we don’t have a TEMP schema in our database:

It seems to be these tables like TEMP.rf9lhgvr_cov_ref causing the issue. But the dialect is set to “sql server” in the SOURCE table apparently. Also I am able to run Incidence analyses fine, and I thought they were likely to also fail if we were using the wrong dialect. Are there any other settings we need to check?

Chris_Knoll · September 7, 2022, 2:13pm

Hi, @dandedman ,

Thanks for clarification on your dbms platform. Note that even tho you’re using SQL server, characterization will create permeant tables because of a technical limitation with sql server and prepared statements. Let me give you a quick synopsis on why:

In our analysis queries, we want to avoid sql injection, so we take a very conservative approach and convert all the sql into prepared statements. The issue is that when you execute prepared statements on sql sever, it creates a temporary execution session such that any temp table that is created in a prepared statement will not be available in any subsequent query/prepared statement. So if step one of the analysis is to execute a prepared statement to create a temp table, subsequent steps will not see the temp table because as soon as the prepared statement is done, it cleans up resources (including temp tables). So, we had to work around this by replacing #temp tables with permeant tables (with a randomized prefix), which leads you to your table name of TEMP.rf9lhgvr_cov_ref.

Back to your issue:
Even tho the Java exception says ‘BadSqlGrammarException’, this could come from a number of exceptions: syntax errors, permission errors, table does not exist errors. Do you think you could dig into your log file and around the part where you see the BadSqlGrammarException, you might get a more detailed message from Sql Server about the specific issue that raised the exception. Could you try to find out if it is ‘table does not exist’ or ‘insert permission denied’?

dandedman · September 7, 2022, 3:25pm

Thanks again Chris for the suggestions
I did find this statement:
Caused by: java.sql.BatchUpdateException: Invalid object name ‘CDM.concept’.

Our vocabulary tables are in a different schema (Vocab) - so this obviously causes an issue. Our challenge now is to find where this gets picked up, because we thought we set the right schema in the data source configuration (I’m working with the technical team on it - but any suggestions?).

Chris_Knoll · September 7, 2022, 6:02pm

Ok, glad you found it.

The source code that generates this query is here. It assumes that your cdm schema contains the vocabulary tables. My recommendation to you is to create a view that selects the columns from the vocabulary schema concept table, and name the view ‘concept’ and put it into your cdm schema. This should prevent you from having to make copies of tables, while letting the queries that assume the cdm schema work.

dandedman · September 8, 2022, 1:11pm

Thanks Chris - that worked well.

jchang · September 8, 2022, 1:37pm

@Chris_Knoll

Hi Chris, I have a related question. When installing the WebAPI, we got impression from its “CDM Configuration for use with WebAPI” that vocabulary tables should be in a separate schema “vocabulary”. So we created two schema: “vocabulary” for vocabulary tables and “cdm” for all other OMOP tables. Is there a benefit of doing this? It looks like some OHDSI tools assume that the cdm schema contains the vocabulary tables. As you suggested above, we would need to create views in cdm schema for all tables in vocabulary schema. I am wondering if it is necessary to have a separate vocabulary schema if there are no benefits of doing so.

We are still in the development phase of building up the OMOP database, it would be easier to make changes to the schema setup if there are potential issues with a separate vocabulary schema.

Thanks
Jack

Chris_Knoll · September 8, 2022, 8:16pm

Technically, there isn’t a ‘vocabulary’ schema in the OHDSI ecosystem. The CDM schema contains tables for patient level data, providers, place of service, and tables used to manage identifiers and their relationships (ie: the vocabulary tables: concept, concept_ancestor, concept_relationship, etc). Typically these things are considered a single unit because when you build a CDM, you need a vocabulary to guide your ETL, and so it makes sense to bundle the vocabulary used to ETL with the patient level data, and that’s how the tools were originally designed.

When Atlas/WebAPI was built in 2014, one of the functions of Atlas was vocabulary searching, and some sites wanted to have install to simply browse vocabulary and not use it for any CDM analysis. So, there’s a way to configure a WebAPI SOURCE to just specify a vocabulary daimon, which means that there’s only vocabulary tables on that datasource and other CDM analysis queries should not run on it.

This gave rise to the idea that the vocabulary tables are somehow distinct from the cdm tables. More recently, sites have come online with the idea that they use a single vocabulary to ETL multiple CDMs, and have a shared vocabulary for multiple sources. The OHDSI tools have started to distinguish vocab schema from cdm schema, but not everywhere (feature extraction being one of these places).

I think the best solution to having 1 set of physical vocabulary tables that need to be shared across CDM schemas is to create views in each CDM schema that references back to the vocabulary table. If you don’t have a need to share a vocabulary across multiple CDM ETLs, then I’d suggest you bundle the vocabulary version that you used to ETL the CDM schema that you ETL’d your data into.

jchang · September 8, 2022, 9:08pm

@Chris_Knoll Thank you for introducing the background information. I don’t have a specific use case to support multiple CDM ETLs at our institution at present, but we have been discussing spinning off a De-Identified OMOP database, which could end up in a separate CDM schema. Both schemas may share the same set of vocabulary tables.

I would leave the vocabulary schema as it is and create views in cdm schema. This gives us the flexibility of future database expansion.

Thank you
Jack