OHDSI Home | Forums | Wiki | Github

Data Quality Dashboard tutorial - slides

For those attending the Data Quality tutorial, here’s my slides on the DataQualityDashboard tool:

2 Likes

Hello Ajit, when is the tutorial on? today? Will the presentation be recorded? Regards Conor.

It is right now :slight_smile:

Yes, it will be recorded and made available

Great work @Ajit_Londhe!

Great tool, up and working already. Question about the test: For the combination of CONCEPT_ID 3024128 (Bilirubin.total [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8840 (milligram per deciliter), the number and percent of records that have a value less than 1.00e+00. (Threshold=1%). However, looking at the description for LOINC 1975-2, it gives the range as mg/dL[0.3,1.0] which I take to say between 0.3 and 1.0 milligram per deciliter. Given that range, seems the DQ test for values less than 1.0 is incorrect.

@Ajit_Londhe

Great tutorial. As we discussed, I’m still getting an error with the following traceback:

 > DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, 
+                                       cdmDatabaseSchema = cdmDatabaseSchema, 
+                                       resultsDatabaseSchema = resultsDatabaseSchema,
+                                       cdmSourceName = cdmSourceName, 
+                                       numThreads = numThreads,
+                                       sqlOnly = sqlOnly, 
+                                       outputFolder = outputFolder, 
+                                       verboseMode = verboseMode,
+                                       #writeToTable = writeToTable,
+                                       checkLevels = checkLevels,
+                                       checkNames = checkNames)
Connecting using SQL Server driver
Processing check description: plausibleGender
Connecting using SQL Server driver
Error in parse(text = thresholdFilter) : 
  object 'thresholdFilter' not found
> traceback()
5: parse(text = thresholdFilter)
4: eval(parse(text = thresholdFilter))
3: .evaluateThresholds(checkResults = checkResults, tableChecks = tableChecks, 
       fieldChecks = fieldChecks, conceptChecks = conceptChecks)
2: .summarizeResults(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, 
       checkResults = checkResults, cdmSourceName = cdmSourceName, 
       outputFolder = outputFolder, startTime = startTime, tableChecks = tableChecks, 
       fieldChecks = fieldChecks, conceptChecks = conceptChecks)
1: DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, 
       cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, 
       cdmSourceName = cdmSourceName, numThreads = numThreads, sqlOnly = sqlOnly, 
       outputFolder = outputFolder, verboseMode = verboseMode, checkLevels = checkLevels, 
       checkNames = checkNames)

It seems like the issue is in the .evaluteThresholds() function, which is setting thresholdField at each level. I can’t figure out what’s going on, though. Anyone have any thoughts?

Hi @esholle – can you re-install and try again? I pushed a fix for a faulty if statement that was causing the error when running only plausibleGender

@Ajit_Londhe it worked!! You are the best. A million thanks.

1 Like

This is probably something stupid that I am missing but I am having problems installing the DataQualityDashboard on Windows. When I run the following command in Rstudio:

devtools::install_github(“OHDSI/DataQualityDashboard”)

I get the following error:

Error: Failed to install ‘DataQualityDashboard’ from GitHub:
(converted from warning) installation of package ‘C:/Users/Nubic/AppData/Local/Temp/RtmpMtPHxs/file58cbe650c5/DataQualityDashboard_0.0.1.tar.gz’ had non-zero exit status

I am getting this on Windows 7 Enterprise and my colleague is also getting the same error on the latest Windows.

Can you post more of the console log that appears after running the install command?

Hi Ajit, thanks for posting the slides. We are testing the DQDashboard on a larger CDM and I have a question if there is a simple way to check that the install is OK and the connection works? Maybe a lower resource test that would be best to start with?

@Ajit_Londhe I’m having similar troubles to @mgurley. Since he’s having trouble I feel less embarassed about admitting mine. :disappointed_relieved: Maybe it’s a default download option I need to change? Others seeing the same error message for other packages have concluded as much but I haven’t had luck with that yet. Here’s what I get:

devtools::install_github(“OHDSI/DataQualityDashboard”)
Downloading GitHub repo OHDSI/DataQualityDashboard@master
/Rtools/bin/tar: Child returned status 127
/Rtools/bin/tar: Error is not recoverable: exiting now
External tar failed with --force-local, trying without
/Rtools/bin/tar: Child returned status 127
/Rtools/bin/tar: Error is not recoverable: exiting now
External tar failed with --force-local, trying without
Error: Failed to install ‘DataQualityDashboard’ from GitHub:
Does not appear to be an R package (no DESCRIPTION)
In addition: Warning messages:
1: In utils::untar(tarfile, …) :
‘tar.exe -zxf “C:\Users\AWILLI~1\AppData\Local\Temp\RtmpEdU9k7\file44e816a67f0e.tar.gz” -C “C:/Users/AWILLI~1/AppData/Local/Temp/RtmpEdU9k7/remotes44e8467e7433”’ returned error code 2
2: In system(cmd, intern = TRUE) :

@Dave.Barman Try setting checkNames to just PlausibleGender. That’s what I did to test out the whole soup-to-nuts process. It ran in 23 seconds on a SQL Server-hosted instance with ~3 million PERSONs.

For posterity, @Ajit_Londhe answered me offline in email with the following:

"Looks like there’s an issue with rJava. Most likely it is due to having both 32 and 64 bit versions of R.

I would re-install R and Java using the steps outlined here : [https://ohdsi.github.io/TheBookOfOhdsi/OhdsiAnalyticsTools.html#installR]"

This fixed the issue for me.

And let me add this is an immediate benefit of the Book of OHDSI, so thank you @schuemie and @Frank for a well written set of instructions for R installation!

Oh this could be useful for me immediately, but I can’t find the instruction on the online version. Which chapter and heading are the R installation instructions on?

8.4.5

Dear All, referring to Don Torok’s posting, I suspect that we have a similar issue with the following tests:

combination of CONCEPT_ID 3015377 (CALCIUM [MOLES/VOLUME] IN SERUM OR PLASMA) and UNIT_CONCEPT_ID 8753 (MILLIMOLE PER LITER), the number and percent of records that have a value less than 7.000 – our normal ranges with the unit 8753 (MILLIMOLE PER LITER) are 2.15 -2.7 and all our records have values under 7.000 ! I wonder if these plausible low and high values might actually be referring to the unit 8840 (mg/dl) ?

Further to this subject, are we allowed to modify the plausible values in the Control files? Or we should just modify the threshold and add a note explaining why we have so many records failing the check? Would be great if someone would comment, Thanks so much!

Thanks so much @tajanenp! I agree it seems like the units are off and it makes sense to update the plausible values in the control files. Would you be willing to send us a pull request to fix the issue?

Dear Clair, thank you for your prompt reply! I will get back to the pull request a bit later as i suspect we have at lest one more case like that :slight_smile: but may I ask you , or anyone :slight_smile: , another more general question about the plausible values/thresholds; We seem to have quite some cases where we go slightly or significantly over the thresholds (% records 5.5 up to 50%) and at least at first glance I don’t see anything wrong with the data. So my question is that, how these plausible values are set? And as we are a University hospital, our patient population consist of lots of seriously ill patients, might that explain why we do not seem to fit in the default values/thresholds? Many thanks again, I appreciate all the advice we can get, as I am still at the beginning of the learning curve with this great project!

t