OHDSI Home | Forums | Wiki | Github

Problemas con DataQualityDashboard (R)

Hello, we are carrying out a project to transfer some data that we have to the OMOP CDM. We already have the tables created, but when running the DataQualityDashboard script an error appears and RStudio closes. Can anyone tell me what we can do?

Thank you very much, have a nice day

Guillem

I’m not sure I understand where you are at, but let’s try:

  1. You should run DataQualityDashboard after you have already inserted data into the server, not just created the tables.

  2. Is the server address really “covid”?

  3. You should configure the variable pathToDriver. It needs to point to a folder where the driver (in your case, for PostgreSQL) can be found. If you need to download drivers, you can use this: Download DatabaseConnector JDBC Jar files — downloadJdbcDrivers • DatabaseConnector

Kind regards.

Additionally, it’d be helpful to see the full output from your RStudio console so we can see exactly where the error got raised - possible to paste that in here (redacting any sensitive info)? Thanks!

1 Like

Hello Katy and Fabricio, first of all thank you for your answers, they are very helpful. Sorry for the delay in responding, we have been trying ways to solve the problem and Christmas has come.

We are still having problems connecting to the server.

Here is a list of the changes we have made:

As Fabricio indicates, we have verified that the server is open and working. We have also assigned a path to the folder where the driver is located. The driver we have installed is the one provided by postgresql on its website (Download | pgJDBC).

Below I attach the script and the error that we get.

Thank you very much for your help, have a nice day!

#Script to analyze the quality of the OMOP database using the DataQualityDashboard library

#The RJDBC package is an implementation of R’s DBI (DataBase Interface) using JDBC (Java Database Connectivity) as a back-end.
#This allows R to connect to any DBMS (Database Management System) that has a JDBC driver.

install.packages(“RJDBC”)
library(RJDBC)

#We assign a path to the installed driver (Download | pgJDBC)
drv = JDBC (driverClass = “org.postgresql.Driver”, classPath = “C:/Users/BRACONS/Downloads/driver/postgresql-42.5.1.jar”, identifier.quote = NA)

#We make the connection, in the function we give the link, the name of the database and the password
conn = dbConnect(drv, “jdbc:postgresql://localhost:5432/postgres”, “ddbbname”, “password”)

#Package to perform DataQualityDashboard (Fix for error trying to install Data Quality Dashboard)
install.packages(“devtools”)
devtools::install_github(“OHDSI/DataQualityDashboard”)

library(DataQualityDashboard)
library(DatabaseConnector)

connectionDetails ← DatabaseConnector::createConnectionDetails(
dbms = “postgresql”,
user = “postgres”,
password = “password”,
server = “covid/postgres”,
port = “5432”,
extraSettings = “”,
pathToDriver = “C:/Users/BRACONS/Downloads/driver”
)

cdmDatabaseSchema ← “public” # the fully qualified database schema name of the CDM
resultsDatabaseSchema ← “public” # the fully qualified database schema name of the results schema (that you can write to)
cdmSourceName ← “COVID HCB OMOP” # a human readable name for your CDM source
cdmVersion ← “5.4” # the CDM version you are targetting. Currently supporst 5.2.2, 5.3.1, and 5.4

determine how many threads (concurrent SQL sessions) to use ----------------------------------------

numThreads ← 1 # on Redshift, 3 seems to work well

specify if you want to execute the queries or inspect them ------------------------------------------

sqlOnly ← FALSE # set to TRUE if you just want to get the SQL scripts and not actually run the queries

where should the results and logs go? ----------------------------------------------------------------

outputFolder ← “C:/Users/BRACONS/Downloads/resultados”
outputFile ← “results.json”

logging type -------------------------------------------------------------------------------------

verboseMode ← TRUE # set to FALSE if you don’t want the logs to be printed to the console

write results to table? -----------------------------------------------------------------------

writeToTable ← FALSE # set to TRUE if you want to write to a SQL table in the results schema

write results to a csv file? -----------------------------------------------------------------------

writeToCsv ← FALSE # set to FALSE if you want to skip writing to csv file
csvFile ← “” # only needed if writeToCsv is set to TRUE

if writing to table and using Redshift, bulk loading can be initialized -------------------------------

Sys.setenv(“AWS_ACCESS_KEY_ID” = “”,

“AWS_SECRET_ACCESS_KEY” = “”,

“AWS_DEFAULT_REGION” = “”,

“AWS_BUCKET_NAME” = “”,

“AWS_OBJECT_KEY” = “”,

“AWS_SSE_TYPE” = “AES256”,

“USE_MPP_BULK_LOAD” = TRUE)

which DQ check levels to run -------------------------------------------------------------------

checkLevels ← c(“TABLE”, “FIELD”, “CONCEPT”)

which DQ checks to run? ------------------------------------

checkNames ← c() #Names can be found in inst/csv/OMOP_CDM_v5.3.1_Check_Desciptions.csv

which CDM tables to exclude? ------------------------------------

tablesToExclude ← c()

#Aqui es donde no puedo realizar la conexión

run the job --------------------------------------------------------------------------------------

DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
resultsDatabaseSchema = resultsDatabaseSchema,
cdmSourceName = cdmSourceName,
numThreads = numThreads,
sqlOnly = sqlOnly,
outputFolder = outputFolder,
verboseMode = verboseMode,
writeToTable = writeToTable,
writeToCsv = writeToCsv,
csvFile = csvFile,
checkLevels = checkLevels,
tablesToExclude = tablesToExclude,
checkNames = checkNames)

inspect logs ----------------------------------------------------------------------------

ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder,
sprintf(“log_DqDashboard_%s.txt”, cdmSourceName)))

View the Data Quality Dashboard using the integrated shiny application

DataQualityDashboard::viewDqDashboard(
jsonPath = file.path(getwd(), outputFolder, outputFile))
)

(OPTIONAL) if you want to write the JSON file to the results table separately -----------------------------

jsonFilePath ← “” # put the path to the outputted JSON file
DataQualityDashboard::writeJsonResultsToTable(connectionDetails = connectionDetails,
resultsDatabaseSchema = resultsDatabaseSchema,
jsonFilePath = jsonFilePath)

Something that I do to determine if the problem is with DQD or the connection set up is to run database connector separately.

connectionDetails ← DatabaseConnector::createConnectionDetails(dbms = “sql server”,
user = “whiteRabbit”,
password = “whiteRabbit”,
server = “127.0.0.1”,
port = “59802”,
extraSettings = “”)

conn ← connect(connectionDetails)

querySql(conn, “SELECT COUNT(*) FROM AC_CDM.dbo.person;”)

disconnect(conn)

Agree with @DTorok that this seems to be a DB connection issue, and his testing steps will be good to verify that.

Additionally, I’m not sure if this is intentional, but your DB info in createConnectionDetails does not match what you put in the dbConnect call. Should the server param have “localhost” instead of “covid”? Note also that DatabaseConnector will take care of connecting to your DB when executeDqChecks is run, so I don’t think this dbConnect call is doing anything in your script.

We have managed to solve the problem and run the script. Indeed, it was a problem in the DatabaseConnector part. I did what Torok suggested and saw that I could connect to the data, so I checked the DatabaseConnector and realized that the server had to be localhost/postgres instead of covid/postgres. Thank you very much for your help, have a nice day!

Guillem

1 Like
t