Understanding DQD thresholds for measurePersonCompleteness checks

morten · December 4, 2023, 10:29am

I have been looking at DQD result for the check type measurePersonCompleteness and was somewhat puzzled by the standard thresholds. So I wanted to check if I have misunderstood something or if there is a simple explanation:

These DQD checks evaluate how many patients have no entries in various tables (cf. SQL-query for these checks). The failure threshold is set to 95% or even 100% for most tables (CSV-file for v5.4). So it seems that these checks are typically marked as PASSED even if only 6% of all patient actually have entries in the relevant tables. To me (relatively new to OMOP EHR data), that intuitively seemed quite lax. I would have imagined that it would already be regarded as an quality issue if, say, less than 25% of the patient have an entry in CONDITION_OCCURRENCE. Have I misunderstood the thresholds here or is it indeed considered common and acceptable to have a very large fraction of “sparsely populated” patients?

MaximMoinat · December 4, 2023, 1:25pm

Hi Morten. The measurePersonCompleteness check indeed checks for persons without a record, and the thresholds are indeed very lax. Two ways to think about this:

For OMOP conformance, only the person table and observation period table needs to be populated. For these two tables, the threshold is 0%.
For completeness, it depends on the data source what the expected percentage of persons with records in each domain is. This thresholds can be customised to reflect this; 1) copy the table thresholds file, 2) alter the thresholds fit for your data and 3) pass that to the executeDqChecks() function as the tableCheckThresholdLoc.

morten · December 5, 2023, 7:28am

Thanks Max, that makes it clearer!