OHDSI Home | Forums | Wiki | Github

Cohort (eMERGE T2DM algorithm) implementation failed on local Atlas

SELECT @@VERSION AS 'SQL Server Version';

Microsoft SQL Server 2016 (SP1-CU2) (KB4013106) - 13.0.4422.0 (X64) Mar 6 2017 14:18:16 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)

  • The qualifiers are fine.
  • The cohort generation SQL does not execute successfully for us against SQL Server, neither via Atlas nor through a SQL client.
  • The error message shows up the stack trace: it is error code 8623.
  • The accompanying message is
    SQL state [S0001]; error code [8623]; The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information
  • As @sunnyshang suggested, a possible issue is the size of the target CDM and the load on or configuration of our SQL Server cluster.

As @cukarthik said we are running

Microsoft SQL Server 2012 (SP3-GDR) (KB4057115) - 11.0.6260.1 (X64) 
Jan  9 2018 21:31:12 
Copyright (c) Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

If I have this right:

@cukarthik, @t_abdul_basser, @sunnyshang - SQL Server 2012 SP3 Enterprise - Failed in both Atlas & Management Studio
@lrasmussen - SQL Server 2016 Standard - Failed via Atlas, Worked via Management Studio. Could you provide the fail_message per @Chris_Knollā€™s suggestion?
@ericaVoss - SQL Server 2016 Enterprise - Passed
@anthonysena - Microsoft SQL Server 2012 - 11.0.5058.0 (X64) - Passed (on an empty CDM)

@t_abdul_basser: maybe we should work this over email since it could be highly dependent on your environment? Then we can post the findings back here.

Thanks @anthonysena. We were about to experiment with modifying the query to include some useful OPTION clausesā€“e.g. ā€¦OPTION (QUERYTRACECON 4102)ā€“as per this article but our DBAs beat us to it and are experimenting with configuring the DB session to support the relevant flag SQL Server-side. We are testing that now and things look good so far. (The query has not has not failed yet!)

I will update the thread when we have something to report.

1 Like

@anthonysena - fail message from WebAPI tables starts as (the rest of the SQL in the message truncated for brevity here)

org.springframework.jdbc.UncategorizedSQLException: StatementCallback; uncategorized SQLException for SQL [with primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id) as  (  -- Begin Primary Events  select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id

But from the logs in the Broadsea container I also see:

Caused by: java.sql.BatchUpdateException: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeBatch(SQLServerStatement.java:1915)
at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:561)
at org.springframework.jdbc.core.JdbcTemplate$1BatchUpdateStatementCallback.doInStatement(JdbcTemplate.java:548)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:408)
... 25 more

I need to apologize for making some bad assumptions and not being as thorough in my original testing. It was bothering me that it seemed to work from SSMS, but not from Atlas. After tackling this more methodically, it seems that thatā€™s not entirely the case.

First, we had requested SQL Server 2016, but it looks like 2017 was installed:

Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
Aug 22 2017 17:04:49   Copyright (C) 2017 Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor) 

Next, we have two environments - staging and production. Both databases were manually set up, but done at the same time from the same installer. Iā€™ve confirmed versions are the same, but given that it isnā€™t a direct clone from one to the other, I canā€™t guarantee they are identical.

The OHDSI stack in staging and production are each running off of the Broadsea Docker containers. They were set up on different days, and the production environment has one additional commit (Fixed the R site-library volume path, 4/3/18) not on staging. These were also set up manually, but from the same set of instructions. Again, I canā€™t rule out some difference since these arenā€™t direct clones.

What Iā€™m seeing:
Staging

  • Query in Atlas: PASSES
  • Query in SSMS: PASSES

Production

  • Query in Atlas: FAILS
  • Query in SSMS: FAILS

My mistake was that I tested SSMS in staging, but used the production ATLAS app. Both databases have identical patient data loaded.

I brought in a colleague, Eric Whitley, who knows a lot more about SQL Server than I do and he helped walk me through a possible explanation for the issues we were seeing in our SQL Server 2017 setup.

The one difference we noticed was that tempdb was different sizes between both platforms. Staging was larger than production (1.2GB vs. 900MB). He noted that the execution would use tempdb, which if (in theory) it was being filled could cause the query to then stop running.

Instead of just resizing tempdb to be the same on both platforms, he walked through the execution plan on staging. One thing he noticed is that indexes did not appear to be applied. Sure enough, the indexes were not created on staging or production. We created the indexes on both platforms - production now works, and staging continues to work.

The theory is that by not having indexes, the query execution was filling up tempdb quicker. We may have been at a fringe boundary in size where it would then work on staging (given the extra ~300MB) and fail on production.

Happy to take correction on any of this from those who have better inner workings of SQL Server - my interpretation and theories may be off base. The takeaway is that both environments are working now for 2017 after applying indexes.

Glad to hear it! Index are very tricky business. Depending on hardware configuration (fast SSDs, lots of ram, RAID configuration) some index configurations would work great for some but not for others. I highly recommend you spend time getting the feel for what your hardware is capable of handling and tune accordingly.

Glad to hear you got it worked out!

Thanks @lrasmussen. This is interesting.

t