Understanding Data Quality Dashboard Result - STATUS 'not applicable' and 'error'

Siyan · July 21, 2023, 2:46pm

Hi everyone,

I created a Data Quality Dashboard for one CDM data, which shows 4 statuses: ‘pass,’ ‘fail,’ ‘not applicable,’ and ‘error.’.

In the overview section, it indicates that:

2498 out of 3573 passed checks are marked as ‘Not Applicable’ due to empty tables or fields.
40 out of 131 failed checks are SQL errors.
The corrected pass percentage for ‘Not Applicable’ and ‘Errors’ is 92% (1075/1166).

The demo DQ dashboard provided by OHDSI only has two status pass and fail OHDSI - Data Quality Dashboard

I’m unsure about the reasons for the ‘not applicable’ and ‘error’ statuses. Is this situation abnormal? Do I need to address any issues with my CDM data?

And how can I access details about the problematic records?

Thanks

clairblacketer · July 21, 2023, 3:11pm

Hi Siyan!

I am glad you ran the DQD on your database, that is a huge accomplishment! To your concerns, the ‘Not Applicable’ status just means that your database did not have data to support those checks. Where you should look is in those tables and fields that the DQD reads as empty but you believe should have data, this indicates a problem that needs to be corrected. The SQL errors are related to sql queries that tried to run but could not, possibly because of a missing column, or name mismatch, or something to that effect. I would check into these as well because there is likely a small issue in the DDL that, once fixed, will take care of those errors. Then, start to work through the failed checks starting with conformance, then completeness, then plausibility. The conformance checks ensure that your database complies with determined OMOP CDM standards so these are imperative for using any of our tools and methodologies. The completeness checks look at data completeness and vocabulary mapping completeness, so they are helpful to make sure your data has been mapped properly to the standard vocabulary. The plausibility checks then look for things like data alignment, etc.

MaximMoinat · July 25, 2023, 4:36pm

Also, the Not Applicable and Error statuses are explained here: Check Status Descriptions • DataQualityDashboard

Siyan · July 27, 2023, 6:50pm

Thanks for reply. I took a close look of ERROR status.

I found several checks indicating missing variables in VISIT_DETAIL, VISIT_OCCURRENCE, CONCEPT tables.

For example, the description indicates ‘No ADMITTING_SOURCE_CONCEPT_ID, ADMITTING_SOURCE_VALUE, DISCHARGE_TO_CONCEPT_ID, DISCHARGE_TO_SOURCE_VALUE, VISIT_DETAIL_PARENT_ID variable in VISIT_DETAIL.’

OMOP CDM 5.4 uses ‘admitted_from_concept_id’ instead of ‘ADMITTING_SOURCE_CONCEPT_ID’ for the same variable.

It seems DataQualityDashboard::executeDqChecks includes DQ checks with mismatched column names in the CDM.

For more information, please see below

MaximMoinat · July 28, 2023, 8:21am

Note that there are separate threshold files for v5.3 and v5.4. It seems that in your execution of DQD, the v5.3 thresholds are used while you have v5.4. Did you populate the cdm_version field in your cdm_source table?

Siyan · July 31, 2023, 1:41pm

Thanks for reply.

After change to the correct version, all SQL error left is associated with table in cohort_definiation and cohort, such as ‘A yes or no value indicating if COHORT table is present as expected based on the specification.’ and ‘The number and percent of records with a NULL value in the COHORT_END_DATE of the COHORT that is considered not nullable. (Threshold=0%).’

I took a look of these two tables in my CDM. These two table are empty but the structure looks correct.

According to [Check Status Descriptions • DataQualityDashboard ] (Check Status Descriptions • DataQualityDashboard), I feel the above records with SQL error status is more appropriate to be categorized as not applicable.

katy-sadowski · July 31, 2023, 4:03pm

What error is logged for these checks? (You can find this in the UI, in Error Log for a given check when you click the + button; logged to console when you run DQD; or in the output/errors folder)

I have a hunch that you may need to make sure cohortDatabaseSchema param is set to the correct schema where your COHORT and COHORT_DEFINITION tables reside.

Siyan · July 31, 2023, 7:25pm

The error log is below

The issue is caused by the object ‘DF_RWD_DEV.DQ_FACT.COHORT’ concatenating the database, resultsDatabaseSchema, and table. Instead, it should concatenate the database, cdmDatabaseSchema, and table.

The issue only occurs for table COHORT_DEFINITION and COHORT. How to fix it?

katy-sadowski · July 31, 2023, 7:31pm

Have you set the parameter cohortDatabaseSchema in your executeDqChecks call? (if you don’t set it, it will default to the value of resultsDatabaseSchema.)

Siyan · August 3, 2023, 1:39pm

After adding cohortDatabaseSchema parameter, the SQL error is fixed. Thank you,

The results show only 47% pass the plausibility check, with most failing. Do you have any suggestion for handling failed checks and distinguishing between data source quality and CDM data quality issues?

Most failed check is related to plausibleUnitConceptIds with 100%Rows Violated

katy-sadowski · August 3, 2023, 5:05pm

Glad to hear the error is resolved!

For the plausibility checks, this is a great question and the Data Quality working group is actively discussing ways we can provide more guidance and clarity to users on what check failures mean & how to address them. We also hope to add features in the future which allow users to better track their review process (for example, a status to mark failed checks as “acknowledged” when they’ve been investigated and cannot be resolved).

In the meantime, I encourage you to utilize the SQL queries displayed in the DQD app to investigate failed checks to help determine their root cause. You can also use the Notes fields in the threshold files to record your findings, and when DQD is re-run these notes will show up in the UI.

For plausibleUnitConceptIds - there are some known issues with this check where some of the lists of plausible units may be missing acceptable values for the unit of a given measurement. So it’s possible some or all of your failures are false positives. I’d recommend investigating a handful of failed checks in that category and if you’re finding that these checks should not have failed you may want to disable that check from running. A solution to this issue is in the works, which is tracked here: Plausible Unit Concept IDs Revamp · OHDSI/DataQualityDashboard · Discussion #474 · GitHub

btw, the DQ working group meets every other Thursday at 9:30am eastern; feel free to join us you can find more info and sign up on the OHDSI workgroups page here: OHDSI Workgroups – OHDSI

Siyan · August 4, 2023, 5:05pm

I found there is abnormal value for %record, which is greater than 100%

I found it is hard to understand whether DQ checks are included in verification or validation. Is there a way to filter the validation check and verification check?

katy-sadowski · August 4, 2023, 10:45pm

Thanks for flagging - I wonder if this is due to the fact that some people in your CDM have multiple rows in the DEATH table. Looking at the SQL for this check, the denominator uses a WHERE clause to limit the condition rows to those for persons with a death record; however, the numerator joins conditions directly to the death table, which will cause fanning if a person has multiple death records. This should be fixed; I will file an issue in GitHub.

Interestingly, it seems that there’s an active discussion ongoing re: whether or not we should allow multiple records per person in the death table. The CDM docs do currently say “A person can have up to one record” though.

I found it is hard to understand whether DQ checks are included in verification or validation. Is there a way to filter the validation check and verification check?

The UI doesn’t have a filter for this at the moment. You can look this up for each check in the Check Descriptions file (kahnContext column).

Siyan · August 8, 2023, 8:56pm

Thanks for your reply. It’s very helpful. How can we disable plausibleUnitConceptIds check?

katy-sadowski · August 8, 2023, 9:37pm

You can use the checkNames parameter. To include all but a specific check name, you can do the following:

dqdChecks <- read.csv(system.file("csv", "OMOP_CDMv5.4_Check_Descriptions.csv", package="DataQualityDashboard"), as.is=T)
checkNames <- dqdChecks$checkName
checkNamesNew <- checkNames[!checkNames == 'plausibleUnitConceptIds']

Then pass in checkNamesNew to checkNames in your executeDqChecks call.

We’ll update the docs to include this since it’s not super straightforward.