OHDSI Home | Forums | Wiki | Github

Error seen running script to create Achilles tables on Databricks

I’m setting up my OHDSI environment using Databricks…

When I run the script to create the Achilles database objects (created using the url shown below) I get the error shown below.

http://localhost:8080/WebAPI/ddl/results?dialect=spark&schema=<DB_NAME>_ach_res&vocabSchema=<DB_NAME>&tempSchema=<DB_NAME>_ach_tmp&initConceptHierarchy=true

An older version of the script is working. I’m not sure of the critical difference between the two files. I’m thinking it might be a CDM version that was used when the script was created, but I don’t see anything in the url that indicates what version of the CDM should be used to generate the script (is the version of the CDM to use to generate this script configured in WebAPI/Atlas somewhere?).

The working script, failing script, and full output are attached.
create-achilles-tables_WORKING.sql.txt (117.2 KB)
create-achilles-tables_DOES_NOT_WORK-ddl.sql.txt (115.6 KB)
output.txt (239.4 KB)

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 92321.0 failed 4 times, most recent failure: Lost task 0.3 in stage 92321.0 (TID 140794) (10.22.225.217 executor 262): org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'Monthly' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 5, position 1) ==
INSERT OVERWRITE TABLE demo_cdm_ach_res.heracles_periods  SELECT * FROM insertion_temp
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:256)
	at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)

— EDIT ----------------------------------
Running this (as described in the error message) before executing the script did not fix the problem:
set spark.sql.ansi.enabled=false

I looks like the non-working script has “select *” in several places where the working version specifies columns. Changing the commented out line below fixed this (@ about line 2720). I might be working from an older version of WebAPI/Atlas. I’ll check my versions and update if needed.

-- INSERT OVERWRITE TABLE <DB_NAME>_ach_res.heracles_periods  SELECT * FROM insertion_temp;
   INSERT OVERWRITE TABLE <DB_NAME>_ach_res.heracles_periods  (period_id, period_name, period_order, period_type, period_start_date, period_end_date) SELECT * FROM insertion_temp;

It looks like this is fixed in the latest versions of WebAPI/Atlas. I did a fresh checkout from head for each. The script generated works.

The error I was getting was in v2.12.0 of WebAPI and Atlas.

t