For IMEDS Lab users, reports for CCAE, GE and all other datasets in IMEDS are available via a link in the Cloud Lab. (email me if you want to get to it (due to security)) (similar to a public report here: http://www.ohdsi.org/web/achilles/#/SAMPLE/achillesheel)
I would also like to hear from people that were not able to install Achilles and why. For example, here at NIH, I don’t have non-active directory login to our database and I was not able to make the Achilles work at all. (since I think it requires some login to the database). But we had a separate - non-heel effort about data quality that I used on NIH data. (e.g., patients living more than 130+ years)
Hello Vojtech. At Columbia we certainly have found errors both in the source data and in the ETL using Achilles Heel. For example, we found that our ETL resulted in many records being outside of observation periods. Patients born far too long ago or even in the future due to source data issues were also revealed to us in the Heel report.
Several issues were detected or better understood through other Achilles reports. For example, the aforementioned ancient and future-born patients were more quickly noticed in the Person report’s histogram. The Drug Era report helped us discover a bug in the ETL when the prevalence of one drug era was visually misrepresented in the tree map from what we knew in reality.
For doing an ETL of Nursing Home (NH) data, including the NH Minimum Data Set, the tool helped us identify the following:
Our ETL had originally truncated observation periods for patients with data near the start and end dates of the data pull. The cumulative data graph in the dashboard made this issue obvious
several cases where our load process had incorrectly inferred discharge dates prior to admission (resulting in bad observation periods). Similar issues were identified with drug eras, some related to the source data and some to the ETL process.
an error with our ETL that was causing data from the assessment of cognitive skills for daily decision making to be ignored during translation which left
many empty observation entries
We noticed that a number of observations were triggering the “Number of observation records with no value” error - upon investigation, we learned that we could address this issue if we requested that LOINC answer codes be included in the next vocabulary update - they have been and we are currently filling in the holes
Along the same lines, the achilles reports and analytics motivated us to do further work to have a complete ETL to the standard vocab because of the benefits to data quality assurance
Thank you to all who responded so far.
I would like to include even more sites - so, please reply to this thread or me.
For the paper, we decided to compare Heel output data (CSV file produced by Achilles). We have 5 participating sites and a total of 16 datasets analyzed at this point.
If you are interested in also contributing your site Heel data (no person level data), please le me know. (for example I would love to have Regenstrief data, @jon_duke … )
For example, the most common ruleIDs across datasets are:
717 Distribution of quantity by drug_concept_id; max
600 Proc: Concepts in data are not in correct vocabulary (CPT4/HCPCS/ICD9P)
101 Person: Number of persons by age, with age at first observation period; should not have age < 0
HI @Vojtech_Huser are you still accepting inputs from other sites. We would like to share our experience of using Achilles Heel in the PEDSnet project. Please let me know if you are still open to contributions and I can share some feedback and summarized results.
Yes. New sites can still join the evaluation of Achilles Heel and work on the resulting paper. Please send me a regular email to my work email to coordinate.
To also provide some update to the work:
We continue to work on the paper. Sunny Shang also piloted some classification of the quality rules.
A github version of Achilles (after my fork and merge by Chris) now distinguishes a Heel sub-analysis on top of a Achilles analysis (e.g., 113-1 is the rule for checking pre-birth events and 113-2 is the rule for post mortem events).
The dilema is which CDM version to focus on (v5 vs v4) (probably v5 and port back at certain time points the changes from v5 to v4).
We did a mini email survey of sites on their use of Achilles Heel. (80% response rate).
Existing included sites are hereby encouraged to check the manuscript draft (in the cloud) and post comments.
I wanted to share an update on Achilles Heel.
After Data Quality Code-a-thon, me and Chris Knoll did few changes to Achilles Heel.
each quality rule now outputs some quantitative data into separate
(newly created) columns (so that the extend of the problem can be quantified) (eg,
number of rule offending rows).
For Heel users, the latest Achilles version now has improved overviews of the rules and derived analyses. (there were many questions about the rules on the forum lately)
See CSV files here:
Two of the CSV files are also made into an html overview in Extras folder
Two updates:
1.
I am pleased to announce that the Heel Evaluation Study has been now fully accepted. (proofs are being generated - so link to PDF is comming soon)
citation would be
Huser V, DeFalco F, Schuemie M, Ryan P, Shang N, Velez M, Park R, Boyce R, Duke J, Khare R et al: Multi-site Evaluation of a Data Quality Tool for Patient-Level Clinical Datasets. eGEMs 2016.
Also, I am organizing a new 2-3 iterations study to set thresholds to some of the new rules in Achilles 1.3.
The Heel-Evaluation-Study manuscript has finished the proof stage (after several weeks) and is now officially published. The link to the journal site is:
I would like to work with the developers to also possible test the Heel component on Impala, Bigquery and redshift. If you have this environment, please report issues with executing Hee (here https://github.com/OHDSI/Achilles/blob/master/inst/sql/sql_server/AchillesHeel_v5.sql ) to this thread so that Heel code can be tweaked to work on all 8 dialects.