OHDSI Home | Forums | Wiki | Github

Cohort (eMERGE T2DM algorithm) implementation failed on local Atlas

@ericaVoss - what DB did you try it against?

@cukarthik - It definitely is specific to SQL Server, but still not sure the underlying cause

I can say that ours is a very fresh install of the OHDSI stack on SQL Server, and it is isolated so that no other queries were running at the time. I’m still curious why the generated SQL ran for me, but not @sunnyshang, yet we are both seeing the error when running from OHDSI. When run from the app do we have to factor in the SQL Server Java driver as a possible source of the error? I’m not convinced yet if that’s a valid lead.

@lrasmussen - Through ATLAS to a MS APS.

I have a copy of SYNPUF on MSSQL, running it there now, but it is taking awhile. I’ll let you know when it finishes.

Yea, I’m thinking that is the case. I haven’t tested the ohdsi stack against Postgres since we have not set that up. We were getting the same error against SQL Server both directly hitting the dB and via the ohdsi stack, which makes sense. We tried the query on two other sql server instances and experience the same thing, and one of the instances is new so their is no usage on it right now.

@cukarthik, it took 2 hours but it finished on our MSSQL box on SYNPUF CDM.

@ericaVoss How big is the SYNPUF cohort?

I got 518,981 rows in my #FINAL_COHORT table

Hi, Everyone. Sorry for the late reply. Usually when I get an error, I do copy the SQL and run it as you guys did. If it runs OK locally but not through Atlas, it’s possible that there is a configuration error in your WebAPI’s SOURCE and SOURCE_DAIMON table such that you possibly have the wrong table qualifiers set up.

We also have a field in the WebAPI’s cohort_generation_info table called ‘fail_message’ which can tell you what the error was in generation:

select fail_message from {webapiSchema}.cohort_generation_info where id={cohortDefinitionId}

Run that on your WebAPI database and if it failed, then you should see an error message.

-Chris

To add to @Chris_Knoll’s post: it would be helpful to know which version(s) of SQL Server are in use amongst the folks on this thread @ericaVoss, @cukarthik, @lrasmussen, @sunnyshang

We are running SQL Server 2012 enterprise. The strange thing is that our Postgres instance isn’t as well resourced as our SQL Server machines and it completed.

SQL Server 2016 Standard for me

SELECT @@VERSION AS 'SQL Server Version';

Microsoft SQL Server 2016 (SP1-CU2) (KB4013106) - 13.0.4422.0 (X64) Mar 6 2017 14:18:16 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)

  • The qualifiers are fine.
  • The cohort generation SQL does not execute successfully for us against SQL Server, neither via Atlas nor through a SQL client.
  • The error message shows up the stack trace: it is error code 8623.
  • The accompanying message is
    SQL state [S0001]; error code [8623]; The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information
  • As @sunnyshang suggested, a possible issue is the size of the target CDM and the load on or configuration of our SQL Server cluster.

As @cukarthik said we are running

Microsoft SQL Server 2012 (SP3-GDR) (KB4057115) - 11.0.6260.1 (X64) 
Jan  9 2018 21:31:12 
Copyright (c) Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

If I have this right:

@cukarthik, @t_abdul_basser, @sunnyshang - SQL Server 2012 SP3 Enterprise - Failed in both Atlas & Management Studio
@lrasmussen - SQL Server 2016 Standard - Failed via Atlas, Worked via Management Studio. Could you provide the fail_message per @Chris_Knoll’s suggestion?
@ericaVoss - SQL Server 2016 Enterprise - Passed
@anthonysena - Microsoft SQL Server 2012 - 11.0.5058.0 (X64) - Passed (on an empty CDM)

@t_abdul_basser: maybe we should work this over email since it could be highly dependent on your environment? Then we can post the findings back here.

Thanks @anthonysena. We were about to experiment with modifying the query to include some useful OPTION clauses–e.g. …OPTION (QUERYTRACECON 4102)–as per this article but our DBAs beat us to it and are experimenting with configuring the DB session to support the relevant flag SQL Server-side. We are testing that now and things look good so far. (The query has not has not failed yet!)

I will update the thread when we have something to report.

1 Like

@anthonysena - fail message from WebAPI tables starts as (the rest of the SQL in the message truncated for brevity here)

org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [with primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id) as  (  -- Begin Primary Events  select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id

But from the logs in the Broadsea container I also see:

Caused by: java.sql.BatchUpdateException: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeBatch(SQLServerStatement.java:1915)
at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:561)
at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:548)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:408)
... 25 more

I need to apologize for making some bad assumptions and not being as thorough in my original testing. It was bothering me that it seemed to work from SSMS, but not from Atlas. After tackling this more methodically, it seems that that’s not entirely the case.

First, we had requested SQL Server 2016, but it looks like 2017 was installed:

Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
Aug 22 2017 17:04:49   Copyright (C) 2017 Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor) 

Next, we have two environments - staging and production. Both databases were manually set up, but done at the same time from the same installer. I’ve confirmed versions are the same, but given that it isn’t a direct clone from one to the other, I can’t guarantee they are identical.

The OHDSI stack in staging and production are each running off of the Broadsea Docker containers. They were set up on different days, and the production environment has one additional commit (Fixed the R site-library volume path, 4/3/18) not on staging. These were also set up manually, but from the same set of instructions. Again, I can’t rule out some difference since these aren’t direct clones.

What I’m seeing:
Staging

  • Query in Atlas: PASSES
  • Query in SSMS: PASSES

Production

  • Query in Atlas: FAILS
  • Query in SSMS: FAILS

My mistake was that I tested SSMS in staging, but used the production ATLAS app. Both databases have identical patient data loaded.

I brought in a colleague, Eric Whitley, who knows a lot more about SQL Server than I do and he helped walk me through a possible explanation for the issues we were seeing in our SQL Server 2017 setup.

The one difference we noticed was that tempdb was different sizes between both platforms. Staging was larger than production (1.2GB vs. 900MB). He noted that the execution would use tempdb, which if (in theory) it was being filled could cause the query to then stop running.

Instead of just resizing tempdb to be the same on both platforms, he walked through the execution plan on staging. One thing he noticed is that indexes did not appear to be applied. Sure enough, the indexes were not created on staging or production. We created the indexes on both platforms - production now works, and staging continues to work.

The theory is that by not having indexes, the query execution was filling up tempdb quicker. We may have been at a fringe boundary in size where it would then work on staging (given the extra ~300MB) and fail on production.

Happy to take correction on any of this from those who have better inner workings of SQL Server - my interpretation and theories may be off base. The takeaway is that both environments are working now for 2017 after applying indexes.

Glad to hear it! Index are very tricky business. Depending on hardware configuration (fast SSDs, lots of ram, RAID configuration) some index configurations would work great for some but not for others. I highly recommend you spend time getting the feel for what your hardware is capable of handling and tune accordingly.

Glad to hear you got it worked out!

Thanks @lrasmussen. This is interesting.

t