What to use for resultsTableName for DbDiagnostics executeDbDiagnostics?

greshje.gmail · August 10, 2023, 10:08am

I’m running DbDiagnostics base on the instructions at OMOP CDM Database Diagnostics Utility • DbDiagnostics and the guide for creating the dataDiagnosticsSettingList at Data Diagnostics Settings • DbDiagnostics. For the “Run the executeDbDiagnostics” script, I’m not sure what to use for resultsTableName. What table needs to be specified for this?

dbDiagnosticResults <- DbDiagnostics::executeDbDiagnostics(
  connectionDetails = dbProfileConnectionDetails,
  resultsDatabaseSchema = "demo_cdm_ach_res",
  resultsTableName = "",
  outputFolder = outputFolder,
  dataDiagnosticsSettingsList = settingsList
)

(request for update to documentation raised as an issue here: https://github.com/OHDSI/DbDiagnostics/issues/12)

clairblacketer · September 12, 2023, 3:06pm

Hi @greshje.gmail the resultsTableName in this function should be the name of the table in the results schema with the DbProfile results.

greshje.gmail · September 17, 2023, 11:23pm

Thanks @clairblacketer ,

I tried using “db_profile_results” but I am now getting an error that says it cannot find a column named “CDM_SOURCE_NAME”.

I’m not sure if this is the correct table or if this is a different issue alltogether.

Full output is below,
John

> # ---
> #
> # 05-execute-db-diagnostics.R
> #
> # ---
> 
> #
> # Versions of R and Java
> #
> 
> # --- R Version ---------------------
> R.Version()
$platform
[1] "x86_64-w64-mingw32"

$arch
[1] "x86_64"

$os
[1] "mingw32"

$crt
[1] "ucrt"

$system
[1] "x86_64, mingw32"

$status
[1] ""

$major
[1] "4"

$minor
[1] "3.1"

$year
[1] "2023"

$month
[1] "06"

$day
[1] "16"

$`svn rev`
[1] "84548"

$language
[1] "R"

$version.string
[1] "R version 4.3.1 (2023-06-16 ucrt)"

$nickname
[1] "Beagle Scouts"

> # --- Java Version ------------------
> system("java -version")
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
[1] 0
> # -----------------------------------"
> 
> #
> # functions to get databricks token (user will be prompted for keyring password) and url with token as password
> #
> 
> getToken <- function () {
+   return (
+     keyring::backend_file$new()$get(
+       service = "databricks",
+       user = "token",
+       keyring = "databricks_keyring"
+     )
+   )
+ }
> 
> getUrl <- function () {
+   url <- "jdbc:databricks://nachc-databricks.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3;UID=token;PWD="
+   return (
+     paste(url, getToken(), sep = "")
+   )  
+ }
> 
> dbProfileConnectionDetails <- DatabaseConnector::createConnectionDetails (
+   dbms = "spark",
+   connectionString = getUrl(),
+   pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\"
+ )
> 
> dbDiagnosticResults <- DbDiagnostics::executeDbDiagnostics(
+   connectionDetails = dbProfileConnectionDetails,
+   resultsDatabaseSchema = "demo_cdm_ach_res",
+   resultsTableName = "db_profile_results",
+   outputFolder = outputFolder,
+   dataDiagnosticsSettingsList = settingsList
+ )
Connecting using Spark JDBC driver
Error in `.createErrorReport()`:
! Error executing SQL:
java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: [UNRESOLVED_COLUMN] org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `CDM_SOURCE_NAME` cannot be resolved. Did you mean one of the following? [`spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum1`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum2`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum3`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum4`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum5`]; line 1 pos 16
	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:410)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:321)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:149)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:49)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:54)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:299)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:284)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:333)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `CDM_SOURCE_NAME` cannot be resolved. Did you mean one of the following? [`spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum1`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum2`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum3`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum4`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum5`]; line 1 pos 16
	at org.apache.spark.sql.AnalysisException.copy(AnalysisException.scala:128)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:491)
	... 19 more
, Query: SELECT DIS***.
An error report has been created at  D:/_YES_2023-05-28/workspace/SosExamples/_COVID/02-data-diagnostics/errorReportSql.txt
Run `rlang::last_trace()` to see where the error occurred.
>

Here’s a list of all of the tables and columns in the schema. I’m not seeing cdm_source_name but there is a column db_profile_results.cdmsourcename.

Tables.txt (50.3 KB)

greshje.gmail · September 18, 2023, 9:32pm

Also, I’m getting the error shown here when trying to re-run executeDbProfile:
https://forums.ohdsi.org/t/error-seen-in-dbdiagnostics/19942

—EDIT-------------------
The error that this link points to has been resolved.