OHDSI Home | Forums | Wiki | Github

Cohort Generation in Atlas with Generation status Failed


(Swanshi Saxena) #1

Hi,

I am getting generation status as failed while generating cohort in Atlas. I am using Atlas 2.4.3 with webAPI 2.4.3 version. Cohort is created using sample data named as ‘Warfarin New Users 65 or Older at Index with Prior Atrial Fibrillation’.

cohort_generation_info table is loaded with below fail_message:

java.util.concurrent.ExecutionException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [CREATE TEMP TABLE Codesets (codeset_id int NOT NULL,
concept_id bigint NOT NULL
)
; INSERT INTO Codesets (codeset_id, concept_id)
SELECT 1 as codeset_id, c.concept_id FROM (select distinct I.concept_id FROM
(
select concept_id from public.CONCEPT where concept_id in (313217)and invalid_reason is null
UNION select c.concept_id
from public.CONCEPT c
join public.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (313217)
and c.invalid_reason is null
UNION
select distinct cr.concept_id_1 as concept_id
FROM
(
select concept_id from public.CONCEPT where concept_id in (313217)and invalid_reason is null
UNION select c.concept_id
from public.CONCEPT c
join public.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (313217)
and c.invalid_reason is null

) C
join public.concept_relationship cr on C.concept_id = cr.concept_id_2 and cr.relationship_id = ‘Maps to’ and cr.invalid_reason IS NULL

) I
) C; INSERT INTO Codesets (codeset_id, concept_id)
SELECT 2 as codeset_id, c.concept_id FROM (select distinct I.concept_id FROM
(
select concept_id from public.CONCEPT where concept_id in (1310149)and invalid_reason is null

) I
) C; INSERT INTO Codesets (codeset_id, concept_id)
SELECT 3 as codeset_id, c.concept_id FROM (select distinct I.concept_id FROM
(
select concept_id from public.CONCEPT where concept_id in (441139)and invalid_reason is null

) I
) C; CREATE TEMP TABLE qualified_events

AS
WITH primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id) AS (
– Begin Primary Events
select P.ordinal as event_id, P.person_id, P.start_date, P.end_date, op_start_date, op_end_date, cast(P.visit_occurrence_id as bigint) as visit_occurrence_id
FROM
(
select E.

It seems that there is some configuration issue at database level.

Please advice on how to proceed further on this.

Thanks
Swanshi


How can i edit code(sql)
(Rick Kiefer) #2

Hello,
I am running into this same error. Has a solution been found? I can create a cohort definition but it fails when try to generate. Using latest Broadsea docker application against a fresh CDM 5 repository on PostgreSQL.
Thank you
Rick


(Chris Knoll) #3

Your best bet is to go into the Atlas UI, go to the Export->Sql tab, and copy the postgreSQL statements and run them individually. The entire cohort SQL was stored in the error field, but the error field is not large enough to contain the entire SQL so it is truncated.

Let us know if you narrow down which part of the query is giving you the error. Remember, run them one statement at a time so you can get to the specific statement that gives you the error. If you run it as a batch, it may not give you the detail you need.


(Rick Kiefer) #4

The first statement causes an error

CREATE TABLE #Codesets (
codeset_id int NOT NULL,
concept_id bigint NOT NULL
)
;
ERROR: syntax error at or near “#”
SQL state: 42601
Character: 14


(Chris Knoll) #5

That’s the MSSQL rendering, under the SQL tab, there’s dialects, pick the PostgreSQL one, and use that.


(Rick Kiefer) #6

Sorry. Changed the dialect and it is the insert statement

INSERT INTO Codesets (codeset_id, concept_id)
SELECT 0 as codeset_id, c.concept_id FROM (select distinct I.concept_id FROM
(
select concept_id from @vocabulary_database_schema.CONCEPT where concept_id in (1503297)and invalid_reason is null

) I
) C;

From the Broadsea docker-compose.yml
datasource.cdm.schema=cdm
datasource.ohdsi.schema=ohdsi
flyway_schemas=ohdsi
flyway.placeholders.ohdsiSchema=ohdsi

Did I miss a location to define the vocabulary_database_schema?


(Chris Knoll) #7

The replacement tokens comes from the ohdsi.source_daimon table. The @vocabulary_database_schema will use the cdm’s schema in the event that the soruce_daimon table doesn’t declare a Vocabulary daimon table qualifier.

Long story short, you can ignore all the settings in docker-compose, just replace @cdm_database_schema with your cdm schema, and @vocabulary_database_schema also with your cdm schema (unless you have an environment where you put your vocabulary tables in a separate schema. I’m assuming you did not since this is coming from Broadsea.

Then run the script and try to determine where the failure is.

-Chris


How do I set up multiple data sources in Atlas
(Rick Kiefer) #8

Everything runs fine down to the delete statement. What should I use for the 3 target tokens?
@target_database_schema
@target_cohort_table
@target_cohort_id

Side note - While running the statements, I also verified that my source daimon table contains the correct schema values.


(Chris Knoll) #9

@target_database_schema = your results schema
@target_cohort_table = cohort
@target_cohort_id = your cohort definition_id (which you can see in the URL of atlas when you are editing the cohort definition.

If all this works out,then the issue could be in the ‘inclusion analysis statistics’ part of the query, which is not visible via the SQL export (since they do not contribute to constructing the cohort, but they are merely build-time statistics that we capture and save to the results schema.

Are you sure that you have the cohort_inclusion_* tables created in your results schema? You can find this schema by going to your WebAPI endpoint to: WebAPI/ddl/results?dialect=postgresql

Also it would help to know what version of Atlas and WebAPI your instance of Broadsea has. If this turns out to be missing tables, we may need to ask @lee_evans to update the broadsea instances to include the correct results schema with the bundled CDMs.


(Rick Kiefer) #10

The delete statement failed due to a missing final_cohort table.

DELETE FROM ohdsi.cohort where cohort_definition_id = 4;
INSERT INTO ohdsi.cohort (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date)
select 4 as cohort_definition_id, person_id, start_date, end_date
FROM final_cohort CO
;

I have 3 cohort_inclusion tables - cohort_inclusion, cohort_inclusion_result, cohort_inclusion_stats.

The Atlas home page states that I am using version 2.6.0 of Atlas and WebAPI.


(Lee Evans) #11

@Chris_Knoll FYI Broadsea doesn’t currently bundle a (postgres) database results schema or any CDM database tables. It deploys the latest released ATLAS/WebAPI (currently v2.6.0).


(Chris Knoll) #12

Ah, so, @rkiefer, what you’ll need to do is create a new schema in your CDM database, and use the WebAPI endpiont: WebAPI/ddl/results?dialect=postgresql to get the DDL. You use this DDL to create the results schema tables. Note: there are a few table initialization scripts that will run forever if you do not have the proper indexes on your tables, please see this discussion for details:


(Rick Kiefer) #13

We had run the results ddl previously and there are 82 tables in the schema. When I search the results ddl for postgresql, there is no create statement for the final_cohort table. What is the normal number of tables which should be created?


(Chris Knoll) #14

final_cohort table is a temp table in the generation script, which means it does not get created as a permanent table in the results schema.

The normal number of tables in the results schema are the tables that come out of the /ddl/results endpoint. It’s around 30 (not 81). Here is the list from my environment, note this may have a couple of extra tables compared to the version of the code that you are running (since the latest master has a few new results tables compared to version 2.6):

TABLE_SCHEMA TABLE_NAME
cknoll1 ACHILLES_analysis
cknoll1 ACHILLES_HEEL_results
cknoll1 ACHILLES_results
cknoll1 ACHILLES_results_derived
cknoll1 ACHILLES_results_dist
cknoll1 cc_results
cknoll1 cohort
cknoll1 cohort_censor_stats
cknoll1 cohort_features
cknoll1 cohort_features_analysis_ref
cknoll1 cohort_features_dist
cknoll1 cohort_features_ref
cknoll1 cohort_inclusion
cknoll1 cohort_inclusion_result
cknoll1 cohort_inclusion_stats
cknoll1 cohort_summary_stats
cknoll1 concept_hierarchy
cknoll1 feas_study_inclusion_stats
cknoll1 feas_study_index_stats
cknoll1 feas_study_result
cknoll1 heracles_analysis
cknoll1 HERACLES_HEEL_results
cknoll1 heracles_periods
cknoll1 heracles_results
cknoll1 heracles_results_dist
cknoll1 ir_analysis_dist
cknoll1 ir_analysis_result
cknoll1 ir_analysis_strata_stats
cknoll1 ir_strata
cknoll1 nc_results
cknoll1 pathway_analysis_events
cknoll1 pathway_analysis_stats

In my env, my schema is cknoll1. You should have separate schemas for your CDM tables and your results tables.


(Luke Rasmussen) #15

Thanks @Chris_Knoll for your help. I worked with @rkiefer some more on this. In the logs of the Broadsea Docker image there was a nested exception which said this part of the query failed:
delete from ohdsi.cohort_inclusion_result where cohort_definition_id = 4 and mode_id = 0

This does not appear in the exported postgres SQL from ATLAS, when we were testing by hand.

The problem is that mode_id didn’t exist in the table. We cross-referenced against the WebAPI endpoint you noted (WebAPI/ddl/results?dialect=postgresql) and saw that mode_id was missing from a few tables. We manually fixed this, and queries are running now. Unsure how this happened, but will backtrack through our setup for ideas.


(Chris Knoll) #16

Right, there is a portion of the cohort generation in Atlas that generates the inclusion rule stats for reporting the inclusion rule impact reports (which is not necessary to do cohort generation, but is informative when looking at the report in Atlas).

Are you saying that the ddl/results script from the end point did not have a mode_id in the table definition? If so, I should make sure that the tables are properly scripted out, otherwise others will have this same problem.


(Luke Rasmussen) #17

The DDL scripts at the WebAPI endpoint did have the mode_id in the table definition. Sorry for the confusion. That gave us the right DDL so we could recreate the tables. We’re trying to trace back how the tables were initially provisioned as to why they were missing the column.


Randomized data - Atlas Cohort Generation
(Selva Muthu Kumaran Sathappan) #18

@lrasmussen @Chris_Knoll - Can the cohort generation fail due to lack of records matching my cohort definition (Say if my cohort entry criteria results in 0 records, can it fail?
For example, I created a concept set with ‘Warfarin’ and that has yielded results.

So when I used select * from codesets, I am able to find records but when I try the same after running “select * from qualified_events”, the result set is empty.

Can the cohort generation fail due to this? Moreover, I have set the cohort entry criteria as shown below

t1

I tried with different settings for ‘iNITIAL EVENTS’ feature. But still am not able to generate the cohort successfully

We have data for Warfarin users. Can you people help us?

I tried running the queries seperately and they all are working fine. But certain queries yields empty table as results but I guess this is due to the criteria specified. Can cohort generation fail due to this?

Thanks
Selva


(Chris Knoll) #19

If you go to the SQL tab under the Export section (within cohort definitions), you can get the SQL that generates the cohort. You can execute it statement-by-statement to understand where you fail to return results.


t