OHDSI Home | Forums | Wiki | Github

Data quality - Achilles and DQD - for non cdm datasets?

Hello Everyone,

While I understand that Achilles (and its components) and DQD provide some ways to assess the data quality of the CDM dataset. My questions are as below

a) Can Achilles and DQD be used on non-CDM datasets as well? If yes, can share any doc or example on how it is done? I did go through Achilles Github but couldn’t find on non-cdm data quality assessment

b) the only difference between Achilles and DQD is that DQD has very exhaustive rules coverages for data quality check whereas Achilles (heel) has a limited set of rules.

c) While I did read this stmt in github Some Heel rules can be generalized to non-OMOP datasets. Other rules are dependant on OMOP concept ids and a translation of the code to other CDMs would be needed (for example rule with rule_id of 29 uses OMOP specific concept;concept 195075). - May I check does this mean, we can use the same R package to evaluate data quality on non-omop dataset. Is there any other tutorial on how to use it for non-cdm dataset?

Also tagging @Ajit_Londhe as I have benefitted through your related posts on Achilles in the forum. Would really be helpful if you can help me with this?


The OHDSI community uses the OMOP CDM as a basis for all standardized and systematic methods and tools. If you have your data converted you can just download and apply them.

However, they are all Open Source. If you have a non-CDM asset and don’t want to convert it feel free to tweak the tools.

Any reason you don’t want to use the CDM?

Hi @Christian_Reich,

No, we are using CDM as well. But wanted to know whether it can be used for non-CDM datasets as well. Based on point c) above, I was thinking about whether we can adapt this to suit a non-CDM dataset. If any one here has tried it, thought I can seek their inputs. Your response helps. thank you

Hi @Akshay,

Sorry for the delay. Achilles Heel and DQD contain a fair amount of overlap, which is why Heel is being planned for deprecation in favor of DQD.

If interested in trying to emulate DQD checks in a non-CDM source, refer to the SQL templates in the DQD package: DataQualityDashboard/inst/sql/sql_server at master · OHDSI/DataQualityDashboard · GitHub. Ultimately, it’s of course based on the Kahn framework, so that’s another asset to help in refactoring the queries based on the structure and logic of your native data.

But I don’t know of anyone trying to do this in the community, few folks here are entertaining native data analysis :slight_smile:


have you tried running DQD in Azure / Databricks environment? If so, what should I pass in for dbms in ConnectionDetails?