Transaction vs Session Scope for Global Temp Tables Statements

alfranke · February 26, 2015, 4:33pm

We can get around to trying this soon.
I could use this opportunity to help establish a maven pom file, as mentioned here: SqlRender source/javadoc artifacts to repository?

schuemie · March 5, 2015, 1:30am

While @alfranke is trying this out (I just finished testing it on Achilles and CohortMethod), I thought of another extension to get rid of some of the ugliness for the Oracle workarounds:

What if translateSql() accepted yet another parameter called oracle_temp_schema (or something similar), that would automatically be prefixed to temp table names when translating to Oracle? So the SQL Server SQL

CREATE TABLE #my_temp_table (x int);

when translated using

String sessionId = SqlTranslate.generateSessionId();
SqlTranslate.translateSql(sql, "sql server", "oracle", sessionId, "my_temp_schema");

would translate to

CREATE GLOBAL TEMP TABLE my_temp_schema.abcd1234my_temp_table (x int) ON COMMIT PRESERVE ROWS;

As a consequence, people writing the SQL would no longer be forced to ‘USE’ the schema where temp tables are to be written in Oracle.

In our R packages (Achilles, CohortMethod, etc) the oracle_temp_schema could default to the resultsschema.

Vojtech_Huser · March 6, 2015, 3:35pm

I think this is an excellent suggest Martijn. Even the previous one was good. I like the idea of simplifying any workarounds even if it costs additional parameters upstream of the parametized.sql file.

I am about to re-write IRIS using CDM v5 (or write a portion that will run on both v4 and v5 with an extra part for v5.)

I wish I could be in the CDM schema (where data is) and just refer to temp tables.

my ideal part would be

use @cdmSchema

{@useInterimTables ==1} ? {
create temp table @studyName_A
(
    measure varchar(20) not null,
        result bigint,
        explanation varchar(255)
);
}

{@useInterimTables ==1} ? {INSERT INTO @studyName_A (measure, result, explanation)}
select  '02G2',a.cnt, 'count of patients'
FROM
(
    select count(*) cnt from person
) a
;

So my CDM queries for data look “clean” (no prefixes).
Accessing my temp tables - I am not sure what prefix to use. What strategy is best given the ultimate version of sqlrender?

schuemie · March 9, 2015, 6:08am

I went ahead and implemented this. @alfranke: be aware: this probably breaks your code because the Java functions now require an additional parameter.

Here’s an example (not showing parameterized SQL just to avoid confusion. You only translate SQL after rendering it using parameter values):

USE my_cdm_schema;

CREATE TABLE #my_table (person_id BIGINT);

INSERT INTO #my_table (person_id)
SELECT
  person_id
FROM
  person
WHERE 
  year_of_birth > 2000
;

TRUNCATE TABLE #my_table;
DROP TABLE #my_table;

In R you would use:

translateSql(sql, "sql server", "oracle", oracleTempSchema = "temp_schema")$sql

which will produce this SQL:

CREATE GLOBAL TEMPORARY TABLE temp_schema.w8r18pe5my_table (person_id NUMBER(19)) ON COMMIT PRESERVE ROWS;

INSERT INTO temp_schema.w8r18pe5my_table (person_id)
SELECT  
  person_id
FROM 
  person
WHERE  
  year_of_birth > 2000
;

TRUNCATE TABLE temp_schema.w8r18pe5my_table;
DROP TABLE temp_schema.w8r18pe5my_table;

Note that even though it almost works as though you have real temp tables in Oracle, you still have do drop them at the end of your session (on all other platforms temp tables are dropped automatically at the end of the connection).

The Java equivalent would be:

String sessionId = SqlTranslate.generateSessionId();
SqlTranslate.translateSql(sql, "sql server", "oracle", sessionId, "temp_schema");

This is currently still in the unique_oracle_temp_tables branch of SqlRender. Will push to master once it’s been tested some more.

alfranke · March 9, 2015, 6:32pm

A couple of questions, from the perspecitve of WebAPI, which also doesn’t make assumptions of the underlying db vendor.

Should the schema always be the ohdsi/results schema (e.g. getOhdsiSchema()). I hesitate to make yet another external property to drive this… (e.g. temptable.schema).?
Does it make sense to call translateSql() with sessionId and temp table non-null arguments regardless of the DB vendor? I believe your code will just not filter/replace these placeholders but I’m wondering if there might eventually be some undesirable outcomes depending on the underlying db vendor and SqlTranslate signature. @Chris_Knoll - thoughts?

Looks like your changes work fine. It was easier with the prefix/suffix to find the temp tables in the ohdsi schema (they were all grouped together). Now, we have something like 4kddk2HERACLES_COHORT instead of TMP4kddk2_HERACLES_COHORT. I understand making the object names shorter but wonder if we should keep the prefix/suffix. Maybe we could shorten the sessionId to 7 (like github commits)? Just a thought…

alfranke · March 9, 2015, 6:35pm

One other thing, I received the following error but could not reproduce more than once (1 job failed out of 8).

2015-03-09 14:11:55.805 ERROR taskExecutor-3 org.springframework.batch.core.step.AbstractStep -  - Encountered an error executing step cohortAnalysisStep in job cohortAnalysisJob
org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [ALTER SESSION SET current_schema =  OHDSI; delete from OHDSI.HERACLES_results where cohort_definition_id IN (1) and analysis_id IN (1,2,1800,1801,1802); delete from OHDSI.HERACLES_results_dist where cohort_definition_id IN (1) and analysis_id IN (1,2,1800,1801,1802); BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE  HERACLES_cohort';
  EXECUTE IMMEDIATE 'DROP TABLE  HERACLES_cohort';
EXCEPTION
  WHEN OTHERS THEN
    IF SQLCODE != -942 THEN
      RAISE;
    END IF;
END;; BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE  OHDSI.5KNNlsYmQoHERACLES_cohort';
  EXECUTE IMMEDIATE 'DROP TABLE  OHDSI.5KNNlsYmQoHERACLES_cohort';
EXCEPTION
  WHEN OTHERS THEN
    IF SQLCODE != -942 THEN
      RAISE;
    END IF;
END;; CREATE GLOBAL TEMPORARY TABLE OHDSI.5KNNlsYmQoHERACLES_cohort
  ON COMMIT PRESERVE ROWS
AS
SELECT
   subject_id, cohort_definition_id, cohort_start_date, cohort_end_date 
	
FROM
  OHDSI.cohort
  WHERE  cohort_definition_id in (1)
; insert into HERACLES_results (cohort_definition_id, analysis_id, count_value)
select c1.cohort_definition_id, 1 as analysis_id,  COUNT(distinct person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id; insert into HERACLES_results (cohort_definition_id, analysis_id, stratum_1, count_value)
select c1.cohort_definition_id, 2 as analysis_id,  gender_concept_id as stratum_1, COUNT(distinct person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id, GENDER_CONCEPT_ID
; insert into HERACLES_results (cohort_definition_id, analysis_id, stratum_1, count_value)
select c1.cohort_definition_id, 
	1800 as analysis_id, 
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as stratum_1, 
	COUNT(p1.person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id, EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH
; insert into HERACLES_results_dist (cohort_definition_id, analysis_id, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)
select cohort_definition_id, 
	1801 as analysis_id,
	COUNT(count_value) as count_value,
	min(count_value) as min_value,
	max(count_value) as max_value,
	avg(1.0*count_value) as avg_value,
	STDDEV(count_value) as stdev_value,
	max(case when p1<=0.50 then count_value else -9999 end) as median_value,
	max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
	max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
	max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
	max(case when p1<=0.90 then count_value else -9999 end) as p90_value
from
(
select c1.cohort_definition_id,
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as count_value,
	1.0*(row_number() over (partition by c1.cohort_definition_id order by EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH))/(COUNT(EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH) over (partition by c1.cohort_definition_id)+1) as p1
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
) t1
group by cohort_definition_id
; insert into HERACLES_results_dist (cohort_definition_id, analysis_id, stratum_1, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)
select cohort_definition_id, 
	1802 as analysis_id,
	gender_concept_id as stratum_1,
	COUNT(count_value) as count_value,
	min(count_value) as min_value,
	max(count_value) as max_value,
	avg(1.0*count_value) as avg_value,
	STDDEV(count_value) as stdev_value,
	max(case when p1<=0.50 then count_value else -9999 end) as median_value,
	max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
	max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
	max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
	max(case when p1<=0.90 then count_value else -9999 end) as p90_value
from
(
select c1.cohort_definition_id,
	p1.gender_concept_id,
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as count_value,
	1.0*(row_number() over (partition by p1.gender_concept_id, c1.cohort_definition_id order by EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH))/(COUNT(EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH) over (partition by p1.gender_concept_id, c1.cohort_definition_id)+1) as p1
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
) t1
group by cohort_definition_id, gender_concept_id
; TRUNCATE TABLE OHDSI.5KNNlsYmQoHERACLES_cohort; DROP TABLE OHDSI.5KNNlsYmQoHERACLES_cohort; delete from HERACLES_results where count_value <= 0; delete from HERACLES_results_dist where count_value <= 0]; SQL state [null]; error code [17081]; error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7
; nested exception is java.sql.BatchUpdateException: error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7

	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:84)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:416)
	at org.springframework.jdbc.core.JdbcTemplate.batchUpdate(JdbcTemplate.java:611)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet$1.doInTransaction(CohortAnalysisTasklet.java:40)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet$1.doInTransaction(CohortAnalysisTasklet.java:1)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet.execute(CohortAnalysisTasklet.java:36)
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406)
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
	at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271)
	at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
	at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257)
	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:198)
	at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
	at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:386)
	at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:135)
	at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:304)
	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.sql.BatchUpdateException: error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7

	at oracle.jdbc.driver.OracleStatement.executeBatch(OracleStatement.java:4586)
	at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:230)
	at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:572)
	at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:559)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:405)
	... 23 more
2015-03-09 14:11:56.035 INFO taskExecutor-3 org.springframework.batch.core.launch.support.SimpleJobLauncher -  - Job: [SimpleJob: [name=cohortAnalysisJob]] completed with the following parameters: [{cohortDefinitionId:1=1, time=1425924713932}] and the following status: [FAILED]

Chris_Knoll · March 9, 2015, 7:02pm

@alfranke, no thoughts come to mind I’m writing the code to fit the least common denominator for all our known platforms (for example, ssql doesn’t require us to split sql into individual statements, but oracle does, so I’m always splitting).

Re: the error you got, the sql string that it print out from the provider looks like it wasn’t split. I thought you had to split up the input sql into individual statements?

Frank · March 9, 2015, 8:38pm

I would prefer the handling of the temp tables in Oracle were transparent to users of other platforms. Clearly, it would be awkward to pass a variable for the oracle_temp_table_schema in in you are running postgresql. I’m not close enough to this issue to know if it can be transparent. Something else we can try to mull over during the architecture call.

alfranke · March 9, 2015, 10:13pm

It is split, and then executed as a batch.

schuemie · March 9, 2015, 10:39pm

@alfranke: the reason for this error appears to be that the session ID starts with a number. It is also 10 characters long. Seems somehow an ‘old’ sessionId got mixed up with the new code. The new code produces 8 character session IDs where the first one is always a letter.

schuemie · March 9, 2015, 10:47pm

@alfranke, @Frank: I personally like the idea of the Oracle temp schema. It makes the emulation of temp tables on Oracle complete, so people writing parameterized SQL don’t have to think about it. It also is cleaner to separate results from temp tables, and in a lot of applications we no longer require a results schema. Talk about awkward: people using CohortMethod had to specify a results schema, even though no results were written there.

I would think the oracleTempSchema is a parameter that only needs to be set for people running Oracle. And yes, I would argue it is a separate property in the WebApi config. If an admin wants, he can point it to a schema specifically for temp tables, and once in a while just cleans up the schema to remove any left-over tables.

alfranke · March 10, 2015, 12:43am

We are passing in a random string for the sessionId, with no check for alpha initial character. Explains the error but confirms my thought to revert the removal of at least the ‘TMP’ prefix. Would you support that?

schuemie · March 10, 2015, 2:30am

Why are you not using the generateSessionId() function?

I’m a bit hesitant about reintroducing the TMP prefix. If we make the prefix any longer there will be a lot of applications (CohortMethod, Achilles for starters) where we will need to change the temp table names. Taking an additional 3 characters of the random string means a higher probability of collisions.

alfranke · March 10, 2015, 3:21am

We want to be able to run jobs (creating temp tables) in parallel. When I passed in null as a sessionId arg I saw that this sessionId was scope’d at the jvm level, preventing parallel jobs. However, I didn’t realize I could use your algorithm. We just grabbed apache commons lang. I’ll try the generateSessionId in SqlTranslate. Looks like it should do the job.
I would say that the prefix would prevent clients from running into this error. Since there is no mandate on what a client passes in, you might consider adding validation to the sessionId passed in to translate, so you could throw a meaningful exception (not sure how long that would have taken me to troubleshoot). Just a suggestion, no strong preference. Thanks.

schuemie · March 11, 2015, 12:29am

Ah, good point. I added session ID validation (see this commit), and am now mentioning the generateSessionId() function in the JavaDoc.