OHDSI Home | Forums | Wiki | Github

Transaction vs Session Scope for Global Temp Tables Statements

@alfranke , another thought on this: could it be the deadlocks that you are seeing could be that the script is attempting to drop/recreate the global temp table and it is that drop/recreate that needs to block because there is an open transaction reading that table in some context? if you were to alter the script (in a test, not to alter it in github) so that it doesn’t drop/create the global temp table and then run 3 threads, i’m wondering if they would block each other if they didn’t execute the drop. From everything I have read, they shouldn’t because each thread will have it’s own context (unless it’s the case that connection pooling actually shares the same connection instance across callers, then we’d have issues).

Now, on the matter of ensuring each call uses it’s own context, have you tried doing a beginTransaction at the start of the batch and then a .commit after the split SQL is executed? This way, we could set the ‘on commit delete rows’, the rows will live through each iteration of split sql (becuase the transaction will still be active), i BELIEVE that in a connection pool context that if someone starts a transaction, that makes the connection exclusive to the caller such that another person grabbing a connection from the pool will not get the connection enrolled in a transaction. If that’s the case, then you can go back to enabling connection pools.

-Chris

Yes, we are using transactions and were using pooling, which is why I suggested changing to use ‘on commit delete rows’. @schuemie indicated he thought that wasn’t an option. I’m still trying to get clarification on why (I.e. what use cases / scripts manage their own transactions - explicitly commit for example).
Yes, It is likely that the locking was due to the create/drop attempts and previous session/connection pooling and ‘on commit preserve rows’ behavior.

I think because in the R context there’s no beginTransaction/commit avaialble int he api to call, so from R’s perspective, it always commits after each batch statement.

This is only a guess.

@alfranke: Most often we use the temp tables like this:

  1. Create the temp table
  2. Do some inserts and deletes
  3. (Create other temp tables by joining with the first temp table)
  4. Pull the data to the client using a SELECT query
  5. Drop the temp table

Note that in the current SqlRender implementation, you can’t do this in a single call, because the data pull (4) needs to be in a call of its own. Also, to my knowledge, those 5 steps can’t be combined into a single transaction (at least the DELETE would cause problems without commit). Furthermore, it seems to me the real problem is when people try to create the same temp table (in Oracle), and I don’t think we want to create all possible temp tables beforehand (which is what Oracle seems to want), because we have so many different flavors of them.

I agree we want to keep the temp tables, as @Chris_Knoll, and @Vojtech_Huser have argued. To make them workable in Oracle we’ll need to make their names unique. We can have translateSql() do this for us using a random generated ‘session ID’, but we need to know the correct scope to use a single session ID (assuming we want to use connection pooling, which I thing we really do).

I like the suggestion of @Chris_Knoll to require users to instantiate the SqlTranslate class. Another option would be to make the whole approach explicit: Before calling translateSql(), you create a session Id (we could add a function for that to SqlRender), and pass it to translateSql() to be used in Oracle. If no session ID is provided, the default behavior is to generate one ID per JVM (i.e. populate a static variable).

I’ve created a SqlRender branch called unique_oracle_temp_tables, implementing the explicit approach. When using SqlRender from Java, the following code:

String sessionId = SqlTranslate.generateSessionId();
String sql = "CREATE TABLE #covariates (x int);";
sql = SqlTranslate.translateSql(sql, "sql server", "oracle", sessionId);

will produce sql like this:

CREATE GLOBAL TEMPORARY TABLE tmpqibdgfqc0f_covariates  (x int) ON COMMIT PRESERVE ROWS;

The session ID is just a randomly generated string. It is up to you to decide to correct scope to use it in, and when to generate a new session ID.

When the sessionId parameter of translateSql() is set to null, a global session ID will be generated once and used throughout the lifetime of the JVM. This is the current behaviour when using the SqlRender package in R.

Anyone care to give this a try?

We can get around to trying this soon.
I could use this opportunity to help establish a maven pom file, as mentioned here: SqlRender source/javadoc artifacts to repository?

While @alfranke is trying this out (I just finished testing it on Achilles and CohortMethod), I thought of another extension to get rid of some of the ugliness for the Oracle workarounds:

What if translateSql() accepted yet another parameter called oracle_temp_schema (or something similar), that would automatically be prefixed to temp table names when translating to Oracle? So the SQL Server SQL

CREATE TABLE #my_temp_table (x int);

when translated using

String sessionId = SqlTranslate.generateSessionId();
SqlTranslate.translateSql(sql, "sql server", "oracle", sessionId, "my_temp_schema");

would translate to

CREATE GLOBAL TEMP TABLE my_temp_schema.abcd1234my_temp_table (x int) ON COMMIT PRESERVE ROWS;

As a consequence, people writing the SQL would no longer be forced to ‘USE’ the schema where temp tables are to be written in Oracle.

In our R packages (Achilles, CohortMethod, etc) the oracle_temp_schema could default to the resultsschema.

I think this is an excellent suggest Martijn. Even the previous one was good. I like the idea of simplifying any workarounds even if it costs additional parameters upstream of the parametized.sql file.

I am about to re-write IRIS using CDM v5 (or write a portion that will run on both v4 and v5 with an extra part for v5.)

I wish I could be in the CDM schema (where data is) and just refer to temp tables.

my ideal part would be

use @cdmSchema

{@useInterimTables ==1} ? {
create temp table @studyName_A
(
    measure varchar(20) not null,
        result bigint,
        explanation varchar(255)
);
}

{@useInterimTables ==1} ? {INSERT INTO @studyName_A (measure, result, explanation)}
select  '02G2',a.cnt, 'count of patients'
FROM
(
    select count(*) cnt from person
) a
;

So my CDM queries for data look “clean” (no prefixes).
Accessing my temp tables - I am not sure what prefix to use. What strategy is best given the ultimate version of sqlrender?

I went ahead and implemented this. @alfranke: be aware: this probably breaks your code because the Java functions now require an additional parameter.

Here’s an example (not showing parameterized SQL just to avoid confusion. You only translate SQL after rendering it using parameter values):

USE my_cdm_schema;

CREATE TABLE #my_table (person_id BIGINT);

INSERT INTO #my_table (person_id)
SELECT
  person_id
FROM
  person
WHERE 
  year_of_birth > 2000
;

TRUNCATE TABLE #my_table;
DROP TABLE #my_table;

In R you would use:

translateSql(sql, "sql server", "oracle", oracleTempSchema = "temp_schema")$sql

which will produce this SQL:

CREATE GLOBAL TEMPORARY TABLE temp_schema.w8r18pe5my_table (person_id NUMBER(19)) ON COMMIT PRESERVE ROWS;

INSERT INTO temp_schema.w8r18pe5my_table (person_id)
SELECT  
  person_id
FROM 
  person
WHERE  
  year_of_birth > 2000
;

TRUNCATE TABLE temp_schema.w8r18pe5my_table;
DROP TABLE temp_schema.w8r18pe5my_table;

Note that even though it almost works as though you have real temp tables in Oracle, you still have do drop them at the end of your session (on all other platforms temp tables are dropped automatically at the end of the connection).

The Java equivalent would be:

String sessionId = SqlTranslate.generateSessionId();
SqlTranslate.translateSql(sql, "sql server", "oracle", sessionId, "temp_schema");

This is currently still in the unique_oracle_temp_tables branch of SqlRender. Will push to master once it’s been tested some more.

A couple of questions, from the perspecitve of WebAPI, which also doesn’t make assumptions of the underlying db vendor.

  • Should the schema always be the ohdsi/results schema (e.g. getOhdsiSchema()). I hesitate to make yet another external property to drive this… (e.g. temptable.schema).?
  • Does it make sense to call translateSql() with sessionId and temp table non-null arguments regardless of the DB vendor? I believe your code will just not filter/replace these placeholders but I’m wondering if there might eventually be some undesirable outcomes depending on the underlying db vendor and SqlTranslate signature. @Chris_Knoll - thoughts?

Looks like your changes work fine. It was easier with the prefix/suffix to find the temp tables in the ohdsi schema (they were all grouped together). Now, we have something like 4kddk2HERACLES_COHORT instead of TMP4kddk2_HERACLES_COHORT. I understand making the object names shorter but wonder if we should keep the prefix/suffix. Maybe we could shorten the sessionId to 7 (like github commits)? Just a thought…

One other thing, I received the following error but could not reproduce more than once (1 job failed out of 8).

2015-03-09 14:11:55.805 ERROR taskExecutor-3 org.springframework.batch.core.step.AbstractStep -  - Encountered an error executing step cohortAnalysisStep in job cohortAnalysisJob
org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [ALTER SESSION SET current_schema =  OHDSI; delete from OHDSI.HERACLES_results where cohort_definition_id IN (1) and analysis_id IN (1,2,1800,1801,1802); delete from OHDSI.HERACLES_results_dist where cohort_definition_id IN (1) and analysis_id IN (1,2,1800,1801,1802); BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE  HERACLES_cohort';
  EXECUTE IMMEDIATE 'DROP TABLE  HERACLES_cohort';
EXCEPTION
  WHEN OTHERS THEN
    IF SQLCODE != -942 THEN
      RAISE;
    END IF;
END;; BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE  OHDSI.5KNNlsYmQoHERACLES_cohort';
  EXECUTE IMMEDIATE 'DROP TABLE  OHDSI.5KNNlsYmQoHERACLES_cohort';
EXCEPTION
  WHEN OTHERS THEN
    IF SQLCODE != -942 THEN
      RAISE;
    END IF;
END;; CREATE GLOBAL TEMPORARY TABLE OHDSI.5KNNlsYmQoHERACLES_cohort
  ON COMMIT PRESERVE ROWS
AS
SELECT
   subject_id, cohort_definition_id, cohort_start_date, cohort_end_date 
	
FROM
  OHDSI.cohort
  WHERE  cohort_definition_id in (1)
; insert into HERACLES_results (cohort_definition_id, analysis_id, count_value)
select c1.cohort_definition_id, 1 as analysis_id,  COUNT(distinct person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id; insert into HERACLES_results (cohort_definition_id, analysis_id, stratum_1, count_value)
select c1.cohort_definition_id, 2 as analysis_id,  gender_concept_id as stratum_1, COUNT(distinct person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id, GENDER_CONCEPT_ID
; insert into HERACLES_results (cohort_definition_id, analysis_id, stratum_1, count_value)
select c1.cohort_definition_id, 
	1800 as analysis_id, 
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as stratum_1, 
	COUNT(p1.person_id) as count_value
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
group by c1.cohort_definition_id, EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH
; insert into HERACLES_results_dist (cohort_definition_id, analysis_id, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)
select cohort_definition_id, 
	1801 as analysis_id,
	COUNT(count_value) as count_value,
	min(count_value) as min_value,
	max(count_value) as max_value,
	avg(1.0*count_value) as avg_value,
	STDDEV(count_value) as stdev_value,
	max(case when p1<=0.50 then count_value else -9999 end) as median_value,
	max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
	max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
	max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
	max(case when p1<=0.90 then count_value else -9999 end) as p90_value
from
(
select c1.cohort_definition_id,
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as count_value,
	1.0*(row_number() over (partition by c1.cohort_definition_id order by EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH))/(COUNT(EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH) over (partition by c1.cohort_definition_id)+1) as p1
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
) t1
group by cohort_definition_id
; insert into HERACLES_results_dist (cohort_definition_id, analysis_id, stratum_1, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)
select cohort_definition_id, 
	1802 as analysis_id,
	gender_concept_id as stratum_1,
	COUNT(count_value) as count_value,
	min(count_value) as min_value,
	max(count_value) as max_value,
	avg(1.0*count_value) as avg_value,
	STDDEV(count_value) as stdev_value,
	max(case when p1<=0.50 then count_value else -9999 end) as median_value,
	max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
	max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
	max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
	max(case when p1<=0.90 then count_value else -9999 end) as p90_value
from
(
select c1.cohort_definition_id,
	p1.gender_concept_id,
	EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH as count_value,
	1.0*(row_number() over (partition by p1.gender_concept_id, c1.cohort_definition_id order by EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH))/(COUNT(EXTRACT(YEAR FROM c1.cohort_start_date) - p1.YEAR_OF_BIRTH) over (partition by p1.gender_concept_id, c1.cohort_definition_id)+1) as p1
from omopv5_de.PERSON p1
inner join (select subject_id, cohort_definition_id, cohort_start_date from OHDSI.5KNNlsYmQoHERACLES_cohort) c1
on p1.person_id = c1.subject_id
) t1
group by cohort_definition_id, gender_concept_id
; TRUNCATE TABLE OHDSI.5KNNlsYmQoHERACLES_cohort; DROP TABLE OHDSI.5KNNlsYmQoHERACLES_cohort; delete from HERACLES_results where count_value <= 0; delete from HERACLES_results_dist where count_value <= 0]; SQL state [null]; error code [17081]; error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7
; nested exception is java.sql.BatchUpdateException: error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7

	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:84)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:416)
	at org.springframework.jdbc.core.JdbcTemplate.batchUpdate(JdbcTemplate.java:611)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet$1.doInTransaction(CohortAnalysisTasklet.java:40)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet$1.doInTransaction(CohortAnalysisTasklet.java:1)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
	at org.ohdsi.webapi.cohortanalysis.CohortAnalysisTasklet.execute(CohortAnalysisTasklet.java:36)
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406)
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
	at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271)
	at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
	at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257)
	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:198)
	at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
	at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:386)
	at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:135)
	at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:304)
	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.sql.BatchUpdateException: error occurred during batching: ORA-03291: Invalid truncate option - missing STORAGE keyword
ORA-06512: at line 7

	at oracle.jdbc.driver.OracleStatement.executeBatch(OracleStatement.java:4586)
	at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:230)
	at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:572)
	at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:559)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:405)
	... 23 more
2015-03-09 14:11:56.035 INFO taskExecutor-3 org.springframework.batch.core.launch.support.SimpleJobLauncher -  - Job: [SimpleJob: [name=cohortAnalysisJob]] completed with the following parameters: [{cohortDefinitionId:1=1, time=1425924713932}] and the following status: [FAILED]

@alfranke, no thoughts come to mind I’m writing the code to fit the least common denominator for all our known platforms (for example, ssql doesn’t require us to split sql into individual statements, but oracle does, so I’m always splitting).

Re: the error you got, the sql string that it print out from the provider looks like it wasn’t split. I thought you had to split up the input sql into individual statements?

I would prefer the handling of the temp tables in Oracle were transparent to users of other platforms. Clearly, it would be awkward to pass a variable for the oracle_temp_table_schema in in you are running postgresql. I’m not close enough to this issue to know if it can be transparent. Something else we can try to mull over during the architecture call.

It is split, and then executed as a batch.

@alfranke: the reason for this error appears to be that the session ID starts with a number. It is also 10 characters long. Seems somehow an ‘old’ sessionId got mixed up with the new code. The new code produces 8 character session IDs where the first one is always a letter.

@alfranke, @Frank: I personally like the idea of the Oracle temp schema. It makes the emulation of temp tables on Oracle complete, so people writing parameterized SQL don’t have to think about it. It also is cleaner to separate results from temp tables, and in a lot of applications we no longer require a results schema. Talk about awkward: people using CohortMethod had to specify a results schema, even though no results were written there.

I would think the oracleTempSchema is a parameter that only needs to be set for people running Oracle. And yes, I would argue it is a separate property in the WebApi config. If an admin wants, he can point it to a schema specifically for temp tables, and once in a while just cleans up the schema to remove any left-over tables.

We are passing in a random string for the sessionId, with no check for alpha initial character. Explains the error but confirms my thought to revert the removal of at least the ‘TMP’ prefix. Would you support that?

Why are you not using the generateSessionId() function?

I’m a bit hesitant about reintroducing the TMP prefix. If we make the prefix any longer there will be a lot of applications (CohortMethod, Achilles for starters) where we will need to change the temp table names. Taking an additional 3 characters of the random string means a higher probability of collisions.

We want to be able to run jobs (creating temp tables) in parallel. When I passed in null as a sessionId arg I saw that this sessionId was scope’d at the jvm level, preventing parallel jobs. However, I didn’t realize I could use your algorithm. We just grabbed apache commons lang. I’ll try the generateSessionId in SqlTranslate. Looks like it should do the job.
I would say that the prefix would prevent clients from running into this error. Since there is no mandate on what a client passes in, you might consider adding validation to the sessionId passed in to translate, so you could throw a meaningful exception (not sure how long that would have taken me to troubleshoot). Just a suggestion, no strong preference. Thanks.

Ah, good point. I added session ID validation (see this commit), and am now mentioning the generateSessionId() function in the JavaDoc.

t