Postgres connection for Olympus

jon_duke · June 9, 2015, 4:05am

@mdewilde, I have posted an updated version including the CIRCE changes (thanks to @Chris_Knoll for the changes and @alfranke for integrating into Olympus). It is available on the OHDSI website. The reason we do not post Olympus directly on GitHub at this time is that it requires a EULA due to the inclusion of proprietary drivers from Oracle and Microsoft.

@Chris_Knoll, to follow up on your question, we currently have a service that pulls directly from our NLP cohorts and writes them into our OHDSI cohort schema (which happens to be the same as our results schema). But the source cohorts are in a Lucene index. So we are not doing any transferring of data between OHDSI cohort tables.

So should we remove the Cohort schema field from the OLYMPUS configuration to avoid any confusion?

Chris_Knoll · June 9, 2015, 4:54am

I think that makes sense.

mdewilde · June 9, 2015, 9:13am

Hi all,

downloaded the new build and tried it out.

First circe was showing an empty screen again. But I remembered that this could be caused by the caching mechanism of the browser. So I put it in incognito mode. After this Circe shows up. YES!!!

So Circe is now showing up. I tried to create a new cohort as you requested. I was not able to figure out how to do something with a condition but I think I make a cohort only containing death patients. Generating the cohort works and in the jobs I saw that the generation was successful.

Next step was to see if the Test cohort I created shows up in Heracles: YES!!!

Next step was to build a report on this Test cohort. Unfortunately these steps a failing. I did a couple of tests.

See attachment for the logfile. It looks like syntax errors due to a # character.

2015-06-09 - Olympus (rename to txt).xls (844.3 KB)

So, Heracles connection to the cohort table now works. I checked the database and the Test cohort is located in the public schema (as expected).

So if we now migrating our cohort tables to the tables in the public schema Heracles will show the cohort? Will building the reports in Heracles give us the same build-failures? Is this due to MSSQL vs postgres?

Marcel

ericaVoss · June 9, 2015, 12:31pm

@mdewilde this is great news, it is really close now. I think we just need to:

Copy IPCI cohort from the CDM into these Heracles tables as well as its cohort definition
Get the summary data generated by Heracles from the “non-Internet” computer to your laptop for GENT.

Question, earlier did you say that your COHORT table in the CDM is different than PUBLIC? Is this still an issue that needs to be addressed?

mdewilde · June 9, 2015, 2:14pm

Hi Erica,

Heracles now finds the defined Test cohort created with Circe. So the connection between Circe, Heracles and the public schema seems to work now. But….I’m not able to create reports in Heracles. It raised failures when trying to build a report based on the cohort. I tried a couple of reports but all of them fail. Unfortunately I have not seen a graph yet. See log in previous forum post.

Before trying to insert our cohort in the other cohort tables I want to wait until Heracles is working. Otherwise I might corrupt the tables.

I have not had a look how to convert our cohort tables to the other table formats yet. I think it’s pretty straight forward. I’m thinking to modify my ETL to generate a 2 or three extra csv files according to these other cohort tables formats. So then our ETL will create two sets of the cohort tables. One set for the cdm schema and another set for the public schema. I assume having two different cohort approaches is a temporary issue and in future versions there will be just one approach.

For now creating a new database will be the following steps:

· First create the cdm schema in postgres.

· Using the CDMBuilder of Martijn to create the tables, load the vocab and our own tables (including the cohort tables defined in the v5 documentation if still needed).

· At this point the build is still the same as we did before.

· Configure Olympus and start it to make this create the tables in the public schema

· Fill the new created cohort tables with the extra cohort csv files from the ETL.

I understand now that this is more or less the way we plan to make it work. Is this correct?

Erica: do you know if it is enough to fill the cohort table and the cohort_definition table with similar values as the original cohort tables? Or are there specific things I need to take care of?

Thanks,
Marcel

ericaVoss · June 9, 2015, 2:27pm

@mdewilde the only thing I know of that we’ll have to be careful of is our cohort in IPCI has an ID, but that ID may be used already in this new table. For example, I bet that cohort you created CIRCE took ID=1 and I bet our IPCI cohort = 1.

I think you have the right steps in the build process.

@cahilton, @jon_duke, @alfranke, or @Chris_Knoll does Marcel’s log provide any clues? I think we are very close to having Olympus working for @mdewilde!

Chris_Knoll · June 9, 2015, 2:59pm

Just to be clear, the cohort_definition_id from your CDM schema (and your ETL process) will conflict with the cohort_definition_ids that are generated from the CIRCE/OHDSI cohort_definition_table. That means you DO NOT just simply execute the following query:

INSERT INTO public.cohort
SELECT * from cdm.cohort

You need to do the following steps:

1: Create a row in public.cohort_definition with something like the following query:

INSERT INTO public.cohort_definition (name, description, expression_type)
VALUES ('IPCI Time Cohort', 'Describe the IPIC Time Cohort here...', 'EXTERNAL_SOURCED');

(Note: ‘EXTERNAL_SOURCED’ is an important string value indicating that this is not a CIRCE based expression, rather it’s a cohort that has been created from some other source.)

2: Using the new cohort_definition_id autogenerated from above you can now copy over the cohort members from your CDM to PUBLIC cohort table:

INSERT INTO public.cohort (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date)
SELECT {cohort_id} as cohort_definition_id, subject_id, cohort_start_date, cohort_end_date
from cdm.cohort 
where cohort_definition_id = {your CDM.COHORT.COHORT_DEFINITION_ID value}

Once this is done, you have the proper Cohort Definition set up so that HERACLES can find it, and the data in the public.cohort table is what is used when doing analysis.

-Chris

jon_duke · June 9, 2015, 2:57pm

@mdewilde, can you go to your heracles_results table in your public schema and see if you have any rows in there?

jon_duke · June 9, 2015, 3:00pm

Also, @mdewilde, how many patients show up in your cohort table for this death cohort you created? (after you clicked Generate Cohort)

jon_duke · June 9, 2015, 3:06pm

Wait, I think the error is still in the USE command.

ERROR: syntax error at or near “use”

As in USE public

It looks like there was back and forth on this. I removed USE initially then I believe @schuemie said it should be handled okay by SQL Render.

Can we get a ruling on this? Can USE be used in our Template SQL queries, or just use fully specified tables for all?

Chris_Knoll · June 9, 2015, 3:10pm

I’ve removed the need for USE in all of the scripts I’ve written by fully qualifying the table name using the @cdmDatabaseSchema or @ohdsiDatabaseSchema in all of my scripts.

I’m also suspicious that the dialect is not being set.

@mdewilde: On the CIRCE UI, at the bottom left side of the screen, there’s a SHOW SQL button. If you click it, a dialog will appear with 4 tabs for each dialect we support. Can you switch over to the ‘PostgreSQL’ tab and tell me if you see a USE statement in that translation?

-Chris

alfranke · June 9, 2015, 3:14pm

Like @jon_duke indicated, there seem to be a couple distinct translation and/or sql errors in the log file. Not sure if they’re related.

ERROR: function isnull(character varying, unknown) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
  Position: 42

; TRUNCATE TABLE #Codesets; DROP TABLE #Codesets; TRUNCATE TABLE #PrimaryCriteriaEvents; DROP TABLE #PrimaryCriteriaEvents]; nested exception is org.postgresql.util.PSQLException: ERROR: syntax error at or near "#"
  Position: 37

ERROR: operator does not exist: character varying = integer
  Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.

Chris_Knoll · June 9, 2015, 4:10pm

I don’t think it’s translating properly, This is what a postgreSQL temp table looks like:

CREATE TEMP TABLE Codesets

AS
SELECT
 codeset_id, concept_id 

FROM

-Chris

Chris_Knoll · June 9, 2015, 7:11pm

Note, the above CREATE TEMP TABLE syntax came directly out of CIRCE, calling the service to translate sql, passing the value of ‘postgresql’.

Are you certain that the value of the dialect is set to ‘postgresql’?

-Chris

jon_duke · June 9, 2015, 8:35pm

We have modified HERACLES to remove the use of USE. We are making a few other Heracles updates and will get them committed to GitHub, then can rebuild OLYMPUS incorporating these changes.

This does not however address the other errors @alfranke found, which seem to relate to translation issues. @alfranke, @Chris_Knoll- the properties do look set to postgres from the log file:

2015-06-09 09:50:18.914 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.driverClassName,org.postgresql.Driver]
2015-06-09 09:50:18.914 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.url,jdbc:postgresql://localhost:5432/IPCI_Dopey_CDM]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.username,postgres]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.password,***]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.datasource.driverClassName,org.postgresql.Driver]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.datasource.url,jdbc:postgresql://localhost:5432/IPCI_Dopey_CDM]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.datasource.username,postgres]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.datasource.password,***]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.schemas,public]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [flyway.locations,classpath:db/migration/postgresql]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.dialect,postgres]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.cdm.schema,cdm]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.ohdsi.schema,public]
2015-06-09 09:50:18.915 DEBUG main org.ohdsi.olympus.model.WebApiService - - Property [datasource.cohort.schema,cdm]

Chris_Knoll · June 9, 2015, 8:45pm

This is wrong: it should be postgresql.

-Chris

jon_duke · June 9, 2015, 9:18pm

Good catch! Our ql got knocked off!

@alfranke, can you update the value for the dialect to postgresql? Then hopefully if we bring in the latest Heracles version, all will be well for folks running Postgres.

cahilton · June 9, 2015, 9:30pm

WE need to update Heracles and WebAPI

mdewilde · June 10, 2015, 6:19am

Hi all,

Lots of messages when arriving this morning at the office. You are all responding very quick and spending at lot of time on this. We really appreciate this. Thanks!!!

@jon_duke: Both heracles_results and cohort tables in the public schema are empty.

To be prepared for the new build I will do this:

Restore the database to start fresh again (cdm schema filled and empty public schema)
Wait for a new build of Olympus
Start Olympus to create the tables in public schema
Load the cohort_definition and cohort tables as advised by @Chris_Knoll (including the external_source tag)
Try Heracles…

Thanks,
Marcel

mdewilde · June 10, 2015, 6:35am

Hi Chris,

This is the Circe SQL for postgres (don’t see a “use”):

OHDSI
CIRCE
Cohort Inclusion and Restriction Criteria Expression
Cohort Definition List Help
Save Generate
Description:

ExpressionConcept SetsPrint FriendlyRaw JSON
People having any of the following:
Add Primary Event Filters…
a death occurrence from
occurrence start is:
Add Filter…
Delete Filter
with observation at least 0 days prior and 0 days after index
Limit primary events to: per person.
Add Additional Filters
Limit cohort expression results to: per person.
Show SQL Add Options
Generated Sql…
Close
MSSQL ServerMS APSOraclePostgresRed Shift
CREATE TEMP TABLE Codesets

AS
SELECT
codeset_id, concept_id

FROM

(
SELECT -1 as codeset_id, concept_id FROM @cdm_database_schema.CONCEPT where 0 = 1
) C
;

CREATE TEMP TABLE PrimaryCriteriaEvents

AS
SELECT
row_number() over (order by P.person_id) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date

FROM

(
select P.person_id, P.start_date, P.end_date, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date ASC) ordinal
FROM
(
select C.person_id, C.death_date as start_date, (C.death_date + 1) as end_date, C.cause_concept_id as TARGET_CONCEPT_ID
from
(
select d.*
FROM @cdm_database_schema.DEATH d

) C

WHERE C.death_date > ‘2014-1-1’

) P
) P
JOIN @cdm_database_schema.observation_period OP on P.person_id = OP.person_id and P.start_date between OP.observation_period_start_date and op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0) <= P.START_DATE AND (P.START_DATE + 0) <= OP.OBSERVATION_PERIOD_END_DATE
;

DELETE FROM @target_database_schema.@target_cohort_table where cohort_definition_id = @cohort_definition_id;
INSERT INTO @target_database_schema.@target_cohort_table (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date)
select @cohort_definition_id as cohort_definition_id, person_id as subject_id, start_date as cohort_start_date, end_date as cohort_end_date
FROM
(
select RawEvents.*, row_number() over (partition by RawEvents.person_id order by RawEvents.start_date ASC) as ordinal
FROM
(
select pe.person_id, pe.start_date, pe.end_date
FROM PrimaryCriteriaEvents pe

) RawEvents
) Results

;

TRUNCATE TABLE Codesets;
DROP TABLE Codesets;

TRUNCATE TABLE PrimaryCriteriaEvents;
DROP TABLE PrimaryCriteriaEvents;

Because the generated cohort table is still empty: I also try the query as expected for this cohort:

select * from cdm.death where death_date > ‘2014-1-1’

Result is 7446 rows.

Thanks,
Marcel