OHDSI Home | Forums | Wiki | Github

Atlas Characterizations error -org.ohdsi.webapi.exception

Hi team,
I got an error message when run Characterizations in Atlas (see below). Need help to fix it. Thanks
org.ohdsi.webapi.exception.AtlasException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.springframework.jdbc.Uncategorized SQLException: StatementCallback; uncategorized

SQLException for SQL [<START_WITH>t1 AS (
SELECT cohort_definition_id,
COUNT() AS cnt
FROM tmp_v0224.temp_cohort_b4890s3e
WHERE cohort_definition_id IN (422)
GROUP BY cohort_definition_id
),
t2 AS (
SELECT cohort_definition_id,
COUNT(
) AS cnt,
MIN(concept_count) AS min_concept_count,
MAX(concept_count) AS max_concept_count,
SUM(CAST(concept_count AS BIGINT)) AS sum_concept_count,
SUM(CAST(concept_count AS BIGINT) * CAST(concept_count AS BIGINT)) AS squared_concept_count
FROM tmp_v0224.b4890s3econcept_count_data
GROUP BY cohort_definition_id
)
<END_WITH>
CREATE TABLE tmp_v0224.b4890s3econcept_count_stats
USING DELTA
AS
(SELECT
t1.cohort_definition_id,
CASE WHEN t2.cnt = t1.cnt THEN t2.min_concept_count ELSE 0 END AS min_value,
t2.max_concept_count AS max_value,
CAST(t2.sum_concept_count / (1.0 * t1.cnt) AS FLOAT) AS average_value,
CAST(CASE
WHEN t2.cnt = 1 THEN 0
ELSE SQRT((1.0 * t2.cntt2.squared_concept_count - 1.0 * t2.sum_concept_countt2.sum_concept_count) / (1.0 * t2.cnt*(1.0 * t2.cnt - 1)))
END AS FLOAT) AS standard_deviation,
t2.cnt AS count_value,
t1.cnt - t2.cnt AS count_no_value,
t1.cnt AS population_size
FROM
t1
INNER JOIN t2
ON t1.cohort_definition_id = t2.cohort_definition_id)]; SQL state [HY000]; error code [500051]; [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near ‘<’.(line 1, pos 0)

== SQL ==
<START_WITH>t1 AS (
^^^
SELECT cohort_definition_id,
COUNT() AS cnt
FROM tmp_v0224.temp_cohort_b4890s3e
WHERE cohort_definition_id IN (422)
GROUP BY cohort_definition_id
),
t2 AS (
SELECT cohort_definition_id,
COUNT(
) AS cnt,
MIN(concept_count) AS min_concept_count,
MAX(concept_count) AS max_concept_count,
SUM(CAST(concept_count AS BIGINT)) AS sum_concept_count,
SUM(CAST(concept_count AS BIGINT) * CAST(concept_count AS BIGINT)) AS squared_concept_count
FROM tmp_v0224.b4890s3econcept_count_data
GROUP BY cohort_definition_id
)
<END_WITH>
CREATE TABLE tmp_v0224.b4890s3econcept_count_stats
USING DELTA
AS
(SELECT
t1.cohort_definition_id,
CASE WHEN t2.cnt = t1.cnt THEN t2.min_co

Is <START_WITN> valid syntax for Spark? I don’t know anything about the dialect of spark but it looks like this is a sqlRender issue.

<START_WITH> is not valid syntax in Spark. I see it referenced in sqlRender commits 5 and 7 months ago, but not how sqlRender uses that to generate Spark-valid code.

One of the commits was by @schuemie, so perhaps he has insights into this.

@Chris_Knoll and @schuemie, we have a characterization that we ran in October using Spark (Databricks) that ran fine. But today, the same characterization fails with the <START_WITH> message. Some features work and others don’t, but I haven’t been able to track down which. Here are the list of features we used when this worked in October '22:

ID Name
1 Measurement Range Group Short Term
2 Condition Group Era Start Long Term
3 Drug Group Era Start Medium Term
4 Condition Era Short Term
7 Drug Group Era Short Term
10 Condition Era Any Time Prior
18 Condition Era Start Short Term
25 Drug Group Era Overlapping
31 Drug Exposure Short Term
34 Observation Short Term
36 Dcsi
37 Drug Era Start Short Term
43 Drug Era Short Term
44 Drug Era Overlapping
49 Charlson Index
50 Measurement Short Term
56 Observation Medium Term
58 Measurement Value Short Term
70 Demographics Ethnicity
71 Demographics Age Group
72 Demographics Race
73 Demographics Prior Observation Time
74 Demographics Gender
78 Chads 2
82 Chads 2 Vasc
88 Drug Era Medium Term
91 Distinct Measurement Count Short Term
100 Visit Count Medium Term
101 Visit Count Short Term

Ok, sounds like it is a problem with SqlRender, but want to wait for @schuemie to chime in.

Thank all!

So this was done by: Ali Bellamine (github: alibell) (is this you, @alib ?) If so, could you provide some information as to the use of <START_WITH>? It seems to be some sort of tokenization to find the start and ends of CTEs.

@schuemie , any insights on this? It impacts Characterizations (which used to work), Cohort Pathways, and Heracles reports for cohorts.

In Databricks, the following syntax is not allowed:

WITH cte AS (SELECT ...) CREATE TABLE tmp.table AS (SELECT * FROM cte);

Instead, the syntax needs to be:

CREATE TABLE tmp.table AS WITH cte AS (SELECT ...) SELECT * FROM cte;

I think that is what the <START_WITH> and <END_WITH> syntax is trying to process; but the code isn’t doing the final substitutions. You can see the <START_WITH> clauses in replacementsPatterns.csv, but I don’t see how and where the <START_WITH> and <END_WITH> are replaced.

t