I’m running DbDiagnostics base on the instructions at OMOP CDM Database Diagnostics Utility • DbDiagnostics and the guide for creating the dataDiagnosticsSettingList at Data Diagnostics Settings • DbDiagnostics. For the “Run the executeDbDiagnostics” script, I’m not sure what to use for resultsTableName. What table needs to be specified for this?
dbDiagnosticResults <- DbDiagnostics::executeDbDiagnostics(
connectionDetails = dbProfileConnectionDetails,
resultsDatabaseSchema = "demo_cdm_ach_res",
resultsTableName = "",
outputFolder = outputFolder,
dataDiagnosticsSettingsList = settingsList
)
(request for update to documentation raised as an issue here: https://github.com/OHDSI/DbDiagnostics/issues/12)
Hi @greshje.gmail the resultsTableName in this function should be the name of the table in the results schema with the DbProfile results.
Thanks @clairblacketer ,
I tried using “db_profile_results” but I am now getting an error that says it cannot find a column named “CDM_SOURCE_NAME”.
I’m not sure if this is the correct table or if this is a different issue alltogether.
Full output is below,
John
> # ---
> #
> # 05-execute-db-diagnostics.R
> #
> # ---
>
> #
> # Versions of R and Java
> #
>
> # --- R Version ---------------------
> R.Version()
$platform
[1] "x86_64-w64-mingw32"
$arch
[1] "x86_64"
$os
[1] "mingw32"
$crt
[1] "ucrt"
$system
[1] "x86_64, mingw32"
$status
[1] ""
$major
[1] "4"
$minor
[1] "3.1"
$year
[1] "2023"
$month
[1] "06"
$day
[1] "16"
$`svn rev`
[1] "84548"
$language
[1] "R"
$version.string
[1] "R version 4.3.1 (2023-06-16 ucrt)"
$nickname
[1] "Beagle Scouts"
> # --- Java Version ------------------
> system("java -version")
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
[1] 0
> # -----------------------------------"
>
> #
> # functions to get databricks token (user will be prompted for keyring password) and url with token as password
> #
>
> getToken <- function () {
+ return (
+ keyring::backend_file$new()$get(
+ service = "databricks",
+ user = "token",
+ keyring = "databricks_keyring"
+ )
+ )
+ }
>
> getUrl <- function () {
+ url <- "jdbc:databricks://nachc-databricks.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3956472157536757/0123-223459-leafy532;AuthMech=3;UID=token;PWD="
+ return (
+ paste(url, getToken(), sep = "")
+ )
+ }
>
> dbProfileConnectionDetails <- DatabaseConnector::createConnectionDetails (
+ dbms = "spark",
+ connectionString = getUrl(),
+ pathToDriver="D:\\_YES_2023-05-28\\workspace\\SosExamples\\_COVID\\02-data-diagnostics\\drivers\\databricks\\"
+ )
>
> dbDiagnosticResults <- DbDiagnostics::executeDbDiagnostics(
+ connectionDetails = dbProfileConnectionDetails,
+ resultsDatabaseSchema = "demo_cdm_ach_res",
+ resultsTableName = "db_profile_results",
+ outputFolder = outputFolder,
+ dataDiagnosticsSettingsList = settingsList
+ )
Connecting using Spark JDBC driver
Error in `.createErrorReport()`:
! Error executing SQL:
java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: [UNRESOLVED_COLUMN] org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `CDM_SOURCE_NAME` cannot be resolved. Did you mean one of the following? [`spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum1`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum2`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum3`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum4`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum5`]; line 1 pos 16
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:410)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:321)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:149)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:49)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:54)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:299)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:284)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:333)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `CDM_SOURCE_NAME` cannot be resolved. Did you mean one of the following? [`spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum1`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum2`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum3`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum4`, `spark_catalog`.`demo_cdm_ach_res`.`db_profile_results`.`stratum5`]; line 1 pos 16
at org.apache.spark.sql.AnalysisException.copy(AnalysisException.scala:128)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:491)
... 19 more
, Query: SELECT DIS***.
An error report has been created at D:/_YES_2023-05-28/workspace/SosExamples/_COVID/02-data-diagnostics/errorReportSql.txt
Run `rlang::last_trace()` to see where the error occurred.
>
Here’s a list of all of the tables and columns in the schema. I’m not seeing cdm_source_name but there is a column db_profile_results.cdmsourcename.
Tables.txt (50.3 KB)
Also, I’m getting the error shown here when trying to re-run executeDbProfile:
https://forums.ohdsi.org/t/error-seen-in-dbdiagnostics/19942
—EDIT-------------------
The error that this link points to has been resolved.