We are in the process of transforming admissions data from various state hospitals into the OMOP CDM. However we have come across a few records in the source data, where the admission dates are greater than the provided age (in months) and discharge dates. From past experience, how have other implementers dealt with such seemingly erroneous source data? Is it better to just disregard these records altogether or should we be transforming the offending records before loading into the corresponding OMOP tables?
Thank you for your opinion on the matter. Does anyone else have experience in data cleansing methods that they would be willing to share? Just some basic guidelines that could be used as best practice.
I would not blame a data vendor to delete the data ( that indicates implausible events).
However, I think a better approach than deletion (for an academic center, perhaps) is to assign the patient into âpts with errorsâ cohort.
And during analysis, we could choose to exclude patients in this error cohort.
That way we can show source data people where the problems are. If it gets deleted, we donât have a chance to improve the data in the long run.
It kind of defeats the purpose (little bit) to have a drill down feature in Achilles Heel and have Heel at all. It is like âphotoshoping your dataâ.
Thanks for your suggestion, which I believe is a good approach as it categorises all of the source data, erroneous or otherwise. And it would help towards improving data quality in the future.