Hello,
We are trying to customize DQD rules to suit our site. I have few questions listed below. Can you help us with this?
a) I see DQD has two contexts verification
and validation
. May I know how can I find the rules under validation
category? In the github csv
files, I don’t see any specific column which indicates the context (except for 3 rows under check_descriptions.csv
). But in DQD dashboard, am able to see that there are around 402 validation context rules. Am trying to locate these 402 dq checks which comes under validation
category. or the validation is only for 3 scenarios such as implausible gender
, person completeness
and null in non-nullable field
across different tables? Is there any other validation based DQ checks?
b) where can I find info on the external benchmarks/ values used for our validation check? I see that for validation checks, the data is compared with external source. can we know what is the comparator here?
For example, our dataset had 402 validation checks, out of which 1 failed. I would like to find out from where does it pick the info on the external benchmark? Against which value it is comparing our raw data? I know for verification, we can find the threshold limit for columns in Excel sheet. But for validation, where can we find this?
c) In the concept_level.csv
and field_level.csv
, I see there are columns like PlausibleGenderNotes``plausibleValueHighNotes
, plausibleValueLowNotes
, validPrevalenceLow
, validPrevalenceLowThreshold
etc. Am unable to understand how these fields are used. May I know what’s the use of these fields and are they even used for any DQ checks?