OHDSI Home | Forums | Wiki | Github

DQD errors

Hi all!

DQD report contains some errors below:

  1. The number and percent of persons in the CDM that do not have at least one record in the OBSERVATION_PERIOD table.
    According to CDM specification, each person has to have at least one observation period, therefore these persons must be deleted. But in some datasets such records filter out because of logic: event date cannot be more than max date of dataset that is why the number of these records is large.
    Has anyone encountered such an error? Do you filter out these persons or save? Or anything else?
  2. For a CONCEPT_ID 200670 (Benign neoplasm of male genital organ), the number and percent of records associated with patients with an implausible gender (correct gender = Male).
    There are records with female persons which have ‘male’ diagnosis and male persons with ‘female’ diagnosis. Is there a solution for this problem?
  3. For the combination of CONCEPT_ID 3000620 (Complement C3 [Mass/volume] in Serum or Plasma) and UNIT_CONCEPT_ID 8751 (milligram per liter), the number and percent of records that have a value less than 560.000.
    For similar errors, why the combination of concept_id and unit_concept_id is checked? Where does the limit come from? And why DQD consider it as error and not as warning?

Thank you in advance!

Hi @Olga_Osintseva I will do my best to answer your questions:

  1. If I understand correctly, you have records for patients that fall outside the dates that your dataset covers so they do not have an observation period but they have records. Do you trust these records? Do you want to use these records in an analysis? If so, I would consider creating an observation period for these patients that span the period of time when you have records for them. However, events can also occur outside of an observation period. If you want to keep the time span of the database separate from these records then you can still create an observation period for a patient even if there are no records within that span of time.
  2. So far we have not devised a solution for this problem. To me it is an artifact of the data and something to keep in mind if doing an analysis with gender-specific CONCEPT_IDs.
  3. The check for plausible values for measurements and their associated units is looking for biologically implausible values, not normal values. These were devised based on expert review of three physicians. It is considered a fail because we have tried to steer clear of additional levels of check failures like warnings because we have found that the more levels that are introduced, the easier it is to start ignoring the notification (alarm fatigue). If you have reviewed your data and find that the percent of records that fail the check is acceptable to you, I would recommend altering the threshold so that it is no longer considered a failure.

Thanks,
Clair

Thank you @clairblacketer for your answers,
In first question I mean persons which had events outside the dates of dataset and these events were filtered out and now these persons have not records in the events tables and in the observation_period table. On the one hand these persons contradict CDM specification, on the other hand their number may be large as a part of the dataset and it may affects statistics. Do you have an assumption what to do with this problem?

I think it depends on what your data represent. For example if I had a US claims database with patients that were enrolled in an insurance plan and I have record of that enrollment though they did not seek care during that time I would still keep them in the database. Their enrollment period is considered the observation_period and so they satisfy the CDM requirement of at least one observation_period record per person.

An observation_period is meant to be the period of time during which you are reasonably confident your data reliably captures health events for that patient. In the case you are describing where you have filtered patients out based on dates, I would probably drop those patients. Unless of course it was possible to see events for those patients during that period of time and they just didn’t seek care so there are no events, in which case I would keep them and just create an observation_period for them.

t