Error implementing Achilles at UW - pilot database

Hi Everyone

I am getting started with implementing Achilles for a small pilot db at UW and ran into some errors. Appreciate any help.

Thanks!
-Abhi

achillesResults <- achilles(connectionDetails = connectionDetails, 
                            cdmDatabaseSchema='omop', 
                            cdmVersion = "4", runHeel = T, validateSchema=F)

output

Connecting using PostgreSQL driver
Executing multiple queries. This could take a while
  |=========================================| 100%
Analysis took 1.44 mins
Done. Achilles results can now be found in omop
Executing Achilles Heel. This could take a while
  |====                                     |  10%Error executing SQL: Error in .local(conn, statement, ...): execute JDBC update query failed in dbSendUpdate (ERROR: INSERT has more expressions than target columns
  Position: 295)

An error report has been created at  /Users/apratap/Dropbox/dev/achilles_test/errorReport.txt
Error in value[[3L]](cond) : no loop for break/next, jumping to top level

Detailed Error log:

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: INSERT has more expressions than target columns
  Position: 295)

SQL:
INSERT INTO omop.ACHILLES_HEEL_results (
	analysis_id,
	ACHILLES_HEEL_warning
	)
SELECT DISTINCT or1.analysis_id,
	'ERROR: ' || cast(or1.analysis_id as VARCHAR) || '-' || oa1.analysis_name || '; count (n=' || cast(or1.count_value as VARCHAR) || ') should not be > 0' AS ACHILLES_HEEL_warning,
	1 as rule_id,
	or1.count_value as record_count
FROM omop.ACHILLES_results or1
INNER JOIN omop.ACHILLES_analysis oa1
	ON or1.analysis_id = oa1.analysis_id
WHERE or1.analysis_id IN (
		7,
		8,
		9,
		114,
		115,
		118,
		207,
		208,
		209,
		210,
		302,
		409,
		410,
		411,
		412,
		413,
		509,
		510,
		609,
		610,
		612,
		613,
		709,
		710,
		711,
		712,
		713,
		809,
		810,
		812,
		813,
		814,
		908,
		909,
		910,
		1008,
		1009,
		1010,
		1415,
		1500,
		1501,
		1600,
		1601,
		1701
		) --all explicit counts of data anamolies
	AND or1.count_value > 0

R version:
R version 3.2.1 (2015-06-18)

Platform:
x86_64-apple-darwin13.4.0

Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base

Other attached packages:
- Achilles (1.0)
- rjson (0.2.15)
- DatabaseConnector (1.3.0)
- RJDBC (0.2-5)
- rJava (0.9-7)
- DBI (0.3.1)
- SqlRender (1.1.3)

It appears the insert statement columns (only specifying 2) doesn’t match the select. I’ll push a fix for this shortly.

This was committed, but I noticed that the cdmv4 Achilles script wasn’t completely updated to use the new ‘ruleID’ field.

@Vojtech_Huser, could you check on this file for the updates you made to support the rule_id column?
https://github.com/OHDSI/Achilles/blob/master/inst/sql/sql_server/AchillesHeel_v4.sql

Thanks @Chris_Knoll. I will wait to hear back from @Vojtech_Huser about the script update unless it is something I can fix on my end. Let me know.

-Abhi

Script is fixed, and shouldn’t generate errors for you, but rule_id will be null for some of the heel results. This shouldn’t impact you tho. You should try to export the results and enjoy the graphs!

Also, if you just want to re-run the heel again, you don’t need to run the entire achilles. The function is called AchillesHeel, and you pass it the same parameters as Achilles, and it will just run the heel report.

-Chris

I only now saw the email about it. Sorry for late reply. It seems like a Chris found the problem.

In my review of all ruleIDs (in v5.sql file) - I did touch the v4 file at all.

Yep, we’ll need your changes to the v4 file as well. Could you implement this?

@Chris_Knoll: sorry for the delay but I finally tried it now and got another error.

Here is what I did

library(devtools)
install_github("ohdsi/SqlRender")
install_github("ohdsi/DatabaseConnector")
install_github("ohdsi/Achilles")
library(Achilles)

achillesResults <- achilles(connectionDetails = connectionDetails, 
                             cdmDatabaseSchema='omop', 
                             cdmVersion = "4", runHeel = T, validateSchema=F)

Output

Connecting using PostgreSQL driver
Executing multiple queries. This could take a while
  |====================================| 100%
Analysis took 1.38 mins
Done. Achilles results can now be found in omop
Executing Achilles Heel. This could take a while
  |=====                               |  14%Error executing SQL: Error in .local(conn, statement, ...): execute JDBC update query failed in dbSendUpdate (ERROR: syntax error at or near "2"
  Position: 346)

An error report has been created at  /Users/apratap/Dropbox/dev/achilles_test/errorReport.txt
Error in value[[3L]](cond) : no loop for break/next, jumping to top level

Error log:

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: syntax error at or near "2"
  Position: 346)

SQL:
INSERT INTO omop.ACHILLES_HEEL_results (
	analysis_id,
	ACHILLES_HEEL_warning,
	rule_id,
	record_count
	)
SELECT DISTINCT ord1.analysis_id,
  'ERROR: ' || cast(ord1.analysis_id as VARCHAR) || ' - ' || oa1.analysis_name || ' (count = ' || cast(COUNT(ord1.min_value) as VARCHAR) || '); min value should not be negative' AS ACHILLES_HEEL_warning
  2 as rule_id,
  COUNT(ord1.min_value) as record_count
FROM omop.ACHILLES_results_dist ord1
INNER JOIN omop.ACHILLES_analysis oa1
	ON ord1.analysis_id = oa1.analysis_id
WHERE ord1.analysis_id IN (
		103,
		105,
		206,
		406,
		506,
		606,
		706,
		715,
		716,
		717,
		806,
		906,
		907,
		1006,
		1007,
		1502,
		1503,
		1504,
		1505,
		1506,
		1507,
		1508,
		1509,
		1510,
		1511,
		1602,
		1603,
		1604,
		1605,
		1606,
		1607,
		1608
		)
	AND ord1.min_value < 0
	GROUP BY ord1.analysis_id,  oa1.analysis_name

R version:
R version 3.2.1 (2015-06-18)

Platform:
x86_64-apple-darwin13.4.0

Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base

Other attached packages:
- Achilles (1.0)
- rjson (0.2.15)
- DatabaseConnector (1.3.0)
- RJDBC (0.2-5)
- rJava (0.9-7)
- DBI (0.3.1)
- SqlRender (1.1.3)

Ah, it’s missing a comma, I’ve pushed the update. Also, you can just run the achilles heel report by calling

#justHeel
achillesHeelResults <- achillesHeel connectionDetails = connectionDetails, 
                         cdmDatabaseSchema='omop', 
                         cdmVersion = "4")

-Chris

@Chris_Knoll another. I think this time it might be connected to our cdm v4 dbase but not sure. Appreciate your help so far.

Thanks!
-Abhi

Connecting using PostgreSQL driver
Executing multiple queries. This could take a while
  |===============================| 100%
Analysis took 3.02 mins
Done. Achilles results can now be found in omop
Executing Achilles Heel. This could take a while
  |========                             |  21%Error executing SQL: Error in .local(conn, statement, ...): execute JDBC update query failed in dbSendUpdate (ERROR: relation "omop.concept" does not exist
  Position: 506)

An error report has been created at  /Users/apratap/Dropbox/dev/achilles_test/errorReport.txt
Error in value[[3L]](cond) : no loop for break/next, jumping to top level

DBMS:
postgresql

Error:
execute JDBC update query failed in dbSendUpdate (ERROR: relation "omop.concept" does not exist
  Position: 506)

SQL:
INSERT INTO omop.ACHILLES_HEEL_results (
	analysis_id,
	ACHILLES_HEEL_warning,
	rule_id,
	record_count
	)
SELECT or1.analysis_id,
	'ERROR: ' || cast(or1.analysis_id as VARCHAR) || '-' || oa1.analysis_name || '; ' || cast(COUNT(DISTINCT stratum_1) AS VARCHAR) || ' concepts in data are not in vocabulary' AS ACHILLES_HEEL_warning,
	4 as rule_id,
	COUNT(DISTINCT stratum_1) as record_count
FROM omop.ACHILLES_results or1
INNER JOIN omop.ACHILLES_analysis oa1
	ON or1.analysis_id = oa1.analysis_id
LEFT JOIN omop.concept c1
	ON or1.stratum_1 = CAST(c1.concept_id AS VARCHAR)
WHERE or1.analysis_id IN (
		2,
		4,
		5,
		200,
		301,
		400,
		500,
		505,
		600,
		700,
		800,
		900,
		1000,
		1609,
		1610
		)
	AND or1.stratum_1 IS NOT NULL
	AND c1.concept_id IS NULL
GROUP BY or1.analysis_id,
	oa1.analysis_name

R version:
R version 3.2.1 (2015-06-18)

Platform:
x86_64-apple-darwin13.4.0

Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base

Other attached packages:
- Achilles (1.0)
- rjson (0.2.15)
- DatabaseConnector (1.3.0)
- RJDBC (0.2-5)
- rJava (0.9-7)
- DBI (0.3.1)
- SqlRender (1.1.3)
- devtools (1.9.1)

concept is a standard CDM vocabulary table. Very odd that the achilles completes no problem, but then the heel fails (I’m almost certain we reference vocabulary tables during achilles). I’ll look into the query and see if it’s using the wrong prefix for the CDM schema.

Can you tell me what shcema your CDM tables are in, and if they contain vocabulary tables (such as CONCEPT and CONCEPT_ANCESTOR). Can you execute select count(*) from omop.concept?

@Chris_Knoll: details below. Apologies for a very slow response. I am currently out of the country in a UTC+8 timezone that is not making it easy either.

Here is the list of all tables I have in my test OMOP dbase. If there is something missing which I belv is true for concept table and maybe others I will have to go back to IT people who generated dbase.

P.S.: Is there a way to sanity check a OMOP dbase before getting it into Achilles. It would help generate a great report for our IT folks and enable faster testing.

-Abhi

Hi, Abhi,
RE: tables: Looks like you are missing all the vocabulary tables. Please visit this site:
http://www.ohdsi.org/web/athena/

Fill out the form, download the vocabulary files, and import them. I do not have specific details about the instructions about how the vocabulary tables are created, or how you should import the data (from CSVs) into your specific schema, but the vocabulary tables must exist.

RE: validation of CDM tables: There is a flag you can set to validate the schema. Here is the R function declaration:

achilles <- function (connectionDetails, 
                      cdmDatabaseSchema, 
                      oracleTempSchema = cdmDatabaseSchema,
                      resultsDatabaseSchema = cdmDatabaseSchema, 
                      sourceName = "", 
                      analysisIds, 
                      createTable = TRUE, 
                      smallcellcount = 5, 
                      cdmVersion = "4", 
                      runHeel = TRUE,
                      validateSchema = FALSE){

Note, validateSchema defaults to FALSE, but you can call the function with the value of TRUE. Also, this does not validate the Vocabulary tables, but this can be added.

-Chris

@Chris_Knoll

For downloading OMOP vocab tables, I have few questions.

  1. Should I download all the tables or just the ones that are pre-selected
  2. The CDM version stated is 4.5 or 5. Will this work with our CDM v4 dbase.

-Abhi

You need to confirm that you have all the vocabularies necessary to match
those codes used in your source data. You should use vocabulary version
4.5 with a CDMv4 database because documentation or applications written
for CDMv4 will expect the vocabulary to be in the 4.5 format.

@DTorok

Appreciate your response. Given that I am doing this for the first time I am not sure what “vocab tables” are necessary for this to work. It would help if there is clarity on that.

-Abhi

You are going to have to provide more information. Are you doing a
conversion from health care or claims records into OMOP CDMv4, or are you
working from an existing OMOP CDMv4 database?

Either way, you should have some idea of how procedures, drugs and
conditions are coded. For example are are procedure codes CPT4 and HCPCS,
are drug coded in NDC or GPI codes?

@DTorok

The dbase I have is a OMOP CDM v4 dbase. I will, however, need to find out from IT on how the procedures, drugs and conditions were coded. Unfortunately, I don’t have that information.

It would be nice if there was a OHDSI dbase QC script that could be run against a user OMOP CDM dbase to get a sense of possible issues with the database. It will help expedite the onboarding process for first time OHDSI tool users like me. just my 2c.

-Abhi

My fault for not reviewing the entire thread before responding. You should be fine downloading just the checked V4.5 vocabularies. As for the OHDSI dbase QC script, that is Achilles.

@apratap:

Make it easy on yourself: Click the entire gammut and download it all. That way you’ll be fine. The few extra vocabularies will not make a dent in your database.