How do you pinpoint the SQL error in R error reports?

lychenus13 · January 13, 2022, 11:56am

I am a R newbie. Very often we are faced with error like this

Error: Error executing SQL: org.postgresql.util.PSQLException: ERROR: UNION types integer and character varying cannot be matched Position: 325

opened 11:07AM - 13 Jan 22 UTC

Similar to last issue #13 , which is 'solved' by changing our source id columns …of our database (which does not make sense, wasn't the source id and the source value columns are not recommended for analysis)? This time, we have an issue in the middle of the analysis. ![image](https://user-images.githubusercontent.com/53417646/149318837-59fc8bf2-af29-4458-923b-48de642acd19.png) Note ```` Error: Error executing SQL: org.postgresql.util.PSQLException: ERROR: UNION types integer and character varying cannot be matched Position: 325 An error report has been created at D:/OHDSI-LegendT2DM-THIN/LegendT2dm/errorReportSql.txt ```` Below are the errorReportR.txt ```` Thread: Main Message: Error: Error executing SQL: org.postgresql.util.PSQLException: ERROR: UNION types integer and character varying cannot be matched Position: 325 An error report has been created at D:/OHDSI-LegendT2DM-THIN/LegendT2dm/errorReportSql.txt [90mRun `rlang::last_error()` to see where the error occurred.[39m Level: FATAL Time: 2022-01-13 18:19:06 Stack trace: 14: stop(fallback) 13: signal_abort(cnd) 12: abort(paste("Error executing SQL:", message, paste("An error report has been created at ", f 11: .createErrorReport(connection@dbms, err$message, sqlStatement, errorReportFile) 10: value[[3]](cond) 9: tryCatchOne(expr, names, parentenv, handlers[[1]]) 8: tryCatchList(expr, classes, parentenv, handlers) 7: tryCatch({ startQuery <- Sys.time() lowLevelExecuteSql(connection, sqlStatement) 6: DatabaseConnector::executeSql(connection, sql) 5: createConceptCountsTable(connection = connection, cdmDatabaseSchema = cdmDatabaseSchema, tem 4: runConceptSetDiagnostics(connection = connection, tempEmulationSchema = tempEmulationSchema, 3: CohortDiagnostics.R#30: CohortDiagnostics::runCohortDiagnostics(packageName = "LegendT2dm", 2: Assessment.R#103: runExposureCohortDiagnostics(connectionDetails, cdmDatabaseSchema, vocabul 1: assessPhenotypes(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchem R version: R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32 Attached base packages: - stats - graphics - grDevices - datasets - utils - methods - base Other attached packages: - LegendT2dm (0.0.1) - DatabaseConnector (4.0.2) ```` errorReportSql.txt ```` DBMS: postgresql Error: org.postgresql.util.PSQLException: ERROR: syntax error at or near "#" Position: 15 SQL: SELECT * FROM #exposure_cohorts; R version: R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32 Attached base packages: - stats - graphics - grDevices - datasets - utils - methods - base Other attached packages: - LegendT2dm (0.0.1) - DatabaseConnector (4.0.2) ```` What are they comparing when a SQL does not match that deep into the script? Thanks

The Error log says

org.postgresql.util.PSQLException: ERROR: UNION types integer and character varying cannot be matched
  Position: 325

Honestly this is not quite helpful as there are hundreds of 1000-line SQL files in a sizeable projects.

And sometimes it is hard to navigate through the error log tree

Stack trace:
14: stop(fallback)
13: signal_abort(cnd)
12: abort(paste("Error executing SQL:", message, paste("An error report has been created at ", f
11: .createErrorReport(connection@dbms, err$message, sqlStatement, errorReportFile)
10: value[[3]](cond)
9: tryCatchOne(expr, names, parentenv, handlers[[1]])
8: tryCatchList(expr, classes, parentenv, handlers)
7: tryCatch({
    startQuery <- Sys.time()
    lowLevelExecuteSql(connection, sqlStatement)
   
6: DatabaseConnector::executeSql(connection, sql)
5: createConceptCountsTable(connection = connection, cdmDatabaseSchema = cdmDatabaseSchema, tem
4: runConceptSetDiagnostics(connection = connection, tempEmulationSchema = tempEmulationSchema,
3: CohortDiagnostics.R#30: CohortDiagnostics::runCohortDiagnostics(packageName = "LegendT2dm", 
2: Assessment.R#103: runExposureCohortDiagnostics(connectionDetails, cdmDatabaseSchema, vocabul
1: assessPhenotypes(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchem

Generally, how do you pinpoint down the error SQL in OHDSI setting?

And maybe can we have add to the wishlist that the error log would also split out the SQL file it has last parsed?

Thanks

schuemie · January 13, 2022, 1:22pm

Usually there’s also a SQL error report generated that contains the exact SQL statement that caused the error?

schuemie · January 13, 2022, 1:25pm

Also, make sure you use the CDM DDL for your SQL dialect, and don’t make any changes to that. If the data does not conform to the CDM specs, all bets are off.

lychenus13 · January 13, 2022, 1:39pm

Ah. We used to not know a lot about OHDSI. We used to a external vendor to convert our data. They put some string in the drug_source_concept_id field like the icd9 codes. We thought it was okay to change the source_id column from integer to VARCHAR since it is not encouraged to run analysis on that column.

Until we run the LegendT2DM scripts, it was quite obvious that that was not okay.

And lately we thought that not filling in cdm_source might also cause problems.

OHDSI is hard!

schuemie · January 13, 2022, 1:39pm

Yes, it is! But it can be very rewarding too!

Here are the DDL files for your platform.

lychenus13 · January 13, 2022, 1:47pm

yes thank you very much. another lesson we had back then was that one can select the cdm version in the github repository.

back then someone posted a postgresql 6.0 cdm ddl link for us on the forums, and we thought it was already ok. but it turns out most studies should be running on 5.3x and we have to rollback.it was not backward compatible.

lychenus13 · January 13, 2022, 1:57pm

yes, the SQL error report is something like this.

DBMS:
postgresql

Error:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "#"
  Position: 15

SQL:
SELECT * FROM #exposure_cohorts;

R version:
R version 4.0.5 (2021-03-31)

Platform:
x86_64-w64-mingw32

Attached base packages:
- stats
- graphics
- grDevices
- datasets
- utils
- methods
- base

Other attached packages:
- LegendT2dm (0.0.1)
- DatabaseConnector (4.0.2)

This SQL is a one-liner and somehow we do not know where to start.

And we are not quite sure if SELECT * FROM #exposure_cohorts; is a correct syntax for postgresql (should be)

Most importantly, this SQLerror report contradicts the error in R showing that org.postgresql.util.PSQLException: ERROR: UNION types integer and character varying cannot be matched Position: 325

Then were is the union in the above one-liner?

Maybe it was that I was highlighting all the lines in the R script and so it was some function in the later stage, and the previous SQL error report has been overwritten.