OHDSI Home | Forums | Wiki | Github

Data quality Dashboard (DQD) results misleading


(Jim Frankfort) #1

This was posted to the DQD GitHub issues log and copied here by request. GitHub DQD issue here. DQD recently added quality check denominator to JSON output.

Really appreciate the enhancement of adding denominator count to the DQD output. In looking at the results we think there is a bug and also an interaction between how a CDM is populated and DQ checks made that makes the results less usable.

  1. Bug : we noticed errors reported when an optional field was not populated. E.g. in our instance, visit detail table is not populated. There are multiple checks for visit_detail_ID that fail because this optional field is empty. “A yes or no value indicating if all fields are present in the OBSERVATION table as expected based on the specification.” Technically, the specification says that this is an optional field, so this check is incorrect. Having said that, if an instance does populate the visit detail table then this is a valuable DQ check.
  2. Decreased usability: we identified 2 scenarios where the % passed is misleading.
  3. Scenario 1 : % passed is artificially high:
    2151 out of 3154 DQ checks that had a denominator of 0 that did not fail. This makes aggregate % pass results look better by increasing the denominator (aggregate % pass = count fail/count DQ check. These rows seem to be due to the following situations:
  4. For FIELD checks, the table has no rows (COST, NOTE, NOTE_NLP, PAYOR_PLAN_PERIOD, SPECIMEN) so most fields are listed.
  5. For FIELD checks in tables with rows, the specific column has no values (these are optional columns):
  6. CONDITION_OCCURRENCE (condition_start_datetime, condition_end_datetime, condition_status_source_value)
  7. DEVICE_EXPOSURE (device_exposure_end_date, device_exposure_start_datetime, device_exposure_end_datetime,)
  8. DRUG_EXPOSURE (drug_exposure_start_datetime, drug_exposure_end_datetime, route_source_value, verbatim_end_date)
  9. For CONCEPT checks where the CONCEPT_ID and/or UNIT_CONCEPT_ID being checked does not exist in the detail table being checked.
  10. Scenario 2: % passed is artificially low:
    There are checks that fail because related tables are not populated. There are multiple checks related to “visit_detail_id” in each event table. Visit Detail is optional and is not populated, thus there is no visit_detail_id populating in any other tables, so checks for these all fail, lowering the aggregate % passed.

Impact and potential solutions:
IQVIA works with multiple clients’ OMOP instance and client data managers are expressing concern/frustration because of the large number of DQ checks that don’t apply to their instance.

A potential solution is to 1) have individual QC checks related to optional fields, 2) Assess the results of DQD adding a filter to flag checks that are not applicable to that specific instance. This would require some kind of configuration file indicating which tables are empty and which optional fields are populated. The post processing would flag as N/A tests related to empty tables and non-populated optional fields.