SOLVED: PLE Package Error - Warning: Parameter 'vocabulary_database_schema' not found in SQL

krfeeney · December 19, 2019, 12:36pm

Friends,

Came across a new error that’s got me a little befuddled. Anyone seen this…

Creating exposure and outcome cohorts
Connecting using Redshift driver
Creating cohort: …
Warning: Parameter ‘vocabulary_database_schema’ not found in SQL
|==== | 5%Error: Error executing SQL:
java.sql.SQLException: Amazon Invalid operation: syntax error at or near “)”
Position: 699;

Here’s the background. I received a ZIP study package from another investigator. We had two iterations of a study design we’re testing for network dissemination. The problematic package was built using public ATLAS (artifact can be found here: ATLAS). I could successfully run her other package also built with the same public ATLAS without issue (artifact can be found here: ATLAS). I’m not really clear how the same ATLAS can generate two PLEs (which are extremely similar in design) and one exhibits some kind of SQLRender issue and the other does not.

The ErrorReport.txt tells me this:

DBMS:
redshift

Error:
java.sql.SQLException: Amazon Invalid operation: syntax error at or near “)”
Position: 699;

SQL:
CREATE TABLE #qualified_events

DISTKEY(person_id)
AS
WITH
primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id)
AS
(
– Begin Primary Events
select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
FROM
(
select E.person_id, E.start_date, E.end_date,
ROW_NUMBER() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC ) ordinal,
OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
FROM
(

) E
JOIN full_201907_omop_v5.observation_period OP on E.person_id = OP.person_id and E.start_date >= OP.observation_period_start_date and E.start_date <= op.observation_period_end_date
WHERE DATEADD(day,CAST(0 as int),OP.OBSERVATION_PERIOD_START_DATE) <= E.START_DATE AND DATEADD(day,CAST(0 as int),E.START_DATE) <= OP.OBSERVATION_PERIOD_END_DATE
) P
WHERE P.ordinal = 1
– End Primary Events

)

SELECT
event_id, person_id , start_date, end_date, op_start_date, op_end_date, visit_occurrence_id

FROM
(
select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, ROW_NUMBER() OVER (partition by pe.person_id ORDER BY pe.start_date ASC ) as ordinal, cast(pe.visit_occurrence_id as bigint) as visit_occurrence_id
FROM primary_events pe

) QE

R version:
R version 3.5.1 (2018-07-02)

Platform:
x86_64-pc-linux-gnu

Attached base packages:

stats

graphics

grDevices

utils

datasets

methods

base

Other attached packages:

stringr (1.4.0)

plyr (1.8.4)

survival (2.44-1.1)

ParallelLogger (1.1.0)

OhdsiSharing (0.1.3)

MethodEvaluation (1.0.1)

ffbase (0.12.7)

ff (2.2-14)

bit (1.1-14)

ggplot2 (3.2.1)

CohortMethod (3.0.2)

FeatureExtraction (2.2.3)

Cyclops (2.0.2)

EmpiricalCalibration (2.0.0)

SqlRender (1.6.3)

devtools (2.2.1)

usethis (1.5.1)

BridgetTofa1108 (0.0.1)

DatabaseConnector (2.4.1)

@schuemie @SCYou Any thoughts?

Chungsoo_Kim · December 19, 2019, 2:14pm

Hi Kristin,
Not sure, but I think it’s a cohort problem.
The query that extracts the cohort entry event seems to be empty.
Do you have cohorts in your ATLAS?
It might be a good idea to bring the cohort back and try it again.
I hope my comment is helpful.

Thanks
Chungsoo Kim

krfeeney · December 19, 2019, 2:44pm

Thanks @Chungsoo_Kim. I already did a feasibility check in ATLAS and know their are counts for these cohorts. But the PLE cannot retrieve those cohorts because it needs to be encapsulated so it’s rebuilding them in the package.

I agree. It’s something about the syntax of the cohort query that’s being rendered inappropriately for my data layer. But it’s weird because I’ve never had this problem with a PLE package and I’ve run a bunch of PLEs using this database this week.

Trying to debug Position 699 is a hunt.

krfeeney · December 19, 2019, 2:48pm

I should add: it doesn’t matter which database I try to run this study on. I tried another database and had the same vocabulary_database_schema error. So I’m convinced it’s a SQL syntax for Redshift issue related to how it’s pulling the vocab tables. But I don’t exactly know where Position 699 is.

wivern · December 19, 2019, 2:59pm

Hi @krfeeney,

The message about parameter not found is only warning not an error, but I guess the real issue is the improper cohort definition.

The problematic portion of SQL is the following:

select E.person_id, E.start_date, E.end_date,
ROW_NUMBER() OVER (PARTITION BY E.person_id ORDER BY E.sort_date ASC ) ordinal,
OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date, cast(E.visit_occurrence_id as bigint) as visit_occurrence_id
FROM
(

) E

It seems like the cohort definition contains empty inclusion rule or something like that…
Unfortunately I couldn’t open the cohort definitions on the public Atlas.

Chungsoo_Kim · December 19, 2019, 3:04pm

Well, I’ve seen that warning “Parameter ‘vocabulary_database_schema’ not found in SQL” before.
When I entered the cohort query directly, the warning was printed out if I did not put the vocabulary_database_schema parameter.
However, since cdmDatabaseSchema was given, it was automatically used as vocabulary_database_schema so it extracted cohorts without any problems.

krfeeney · December 19, 2019, 4:25pm

Thanks @wivern and @Chungsoo_Kim for giving this some thought! I’ll have to inspect the study design a bit more. Weird that a cohort definition can run in ATLAS but it gets tripped up in the package. Will keep you posted on what comes out.

krfeeney · December 19, 2019, 5:45pm

Well… this is embarrassing.

Mystery solved: package cohort JSON looks like someone deleted a whole section from it compared to the individual cohort JSONs sent for feasibility before it was packaged into the PLE template. Guess it goes to show to trust but verify. Case closed.

SCYou · December 19, 2019, 10:25pm

@krfeeney
Happy to see that case closed.

My two cents: ignore
Warning: Parameter ‘vocabulary_database_schema’ not found in SQL

The code inside would be look like below
SqlRender::render(sql, vocabulary_database_schema = vocabularyDatabsaeSchema, ...)

But SQL query itself doesn’t have @vocabulary_database_schema . And this is why this warning is generated. You can ignore this.

schuemie · December 20, 2019, 8:49am

Although most cohort definition SQL files have the vocabulary_database_schema parameter, some cohort definitions (that do not make use of the vocabulary) do not. As @SCYou said, in those cases you can ignore the warning.