For our dataset some issues in data have been successfully catched by Data Quality Dashboart (https://github.com/OHDSI/DataQualityDashboard).
But some tests have become failed unjustly.
For CDM Measurement the percent of records with a value of 0 in the standard concept field unit_concept_id is higher than threshold (5%).
But the result of my investigation shows that this percent is 3.3% where unit_source_value is not null. It is Ok.
There is no records in the Payer_Plan_Period table that does not exist in the Person table.
The payer_plan_period_id is identifier for each unique combination of payer, plan, family code and time span. It is Ok.
There are a lot of combinations of measurement_concept_id and unit_concept_id in data where the percent of records which contain not good value (higher or less than range boundary values) higher than threshold (5%).
For example, the combination of measurement_concept_id = 3026453 (Cholesterol in IDL in Serum or Plasma) and unit_concept_id = 8840 (milligram per deciliter) and value less than 6.0 consists 11.4% by records and 12 % by persons in data.
The norm of Сholesterol concentration in a blood test ranges from 3.1 to 5 mmol/l for healthy person.
But the laboratory tests are carried out more often for people who have corresponding health problems than for healthy people for prophylactic purposes.
This data cannot be considered as a random sample of people.
And for example, the combination of measurement_concept_id = 40762351 (Hemoglobin in Blood), unit_concept_id = 8713 (gram per deciliter), value higher than 15.0 consists ~20% by records.
The normal range for hemoglobin is: For men, 13.5 to 17.5 grams per deciliter. For women, 12.0 to 15.5 grams per deciliter. So this is normal.
After my little research I’ve become to the conclusion that the results related with the combination of MEASUREMENT_CONCEPT_ID, UNIT_CONCEPT_ID, VALUE_SOURCE_VALUE not always make sense.
Full list of such laboratory tests and above examples in details are presented here: