In 2015, I organized a study that compared Heel Output at several sites. As a result of this study and Data Quality Hackathon (Data Quality research funded by PCORI), I would like to propose some changes to Achilles. In addition to existing parts, two new “components” would be added to Achilles. (see below)
Additions to architecture - (some new proposed)
achilles_pre-computations achilles_export-to-JSON
achilles_heel (set of rules with short output, use only pre-computated values)
achilles_DQ_reports (DQ output that does not fit well with Heel)
achilles_share (or achilles_wiki or achilles_mini) (mini-characterization of a database that is not sensitive)
Adding new content
Also, we would add new analyses to pre-computations (such as x21) 'count of distinct source_values that are currently mapped to concept 0 (for Dx, Proc, Rx, Meas, Obs).
I also propose to add Iris measures into Achilles (10,count of events; 11, count of patients with at least one Dx and one Proc; count of deceased patients).
The goal would also be to add to Achilles [Heel] some analyses that were implemented in an older tool called ‘grouch’.
Export to Wiki or to some other format a small set of statistics that is deemed “safe to share” by everyone. One idea would be that every site in OHDSI use this function to generate a page in the OHDSI Wiki to describe their data. The idea would be that the Export-to-Wiki-function would also run against the pre-computed Achilles results, just like the Export-to-JSON-function.
@vojtech_huser: Could you maybe provide a full list of outputs that you propose go into the Export to Wiki?
Could you also provide the full list of additional quality rules that you would like to see added to Achilles Heel?
Here would be a partial list of proposed added Heel rules (all are warnings (safer for consensus)) (listing them with rule_id used in the beta implementation inside Iris [beta parts])
27,‘percentage of unmapped rows (concept_id 0) is over warning threshold’
34,‘all rows in measurement table have null time component (likely “claim-ish” only lab data) (no real EHR data)’
35,‘thare are no numerical results in measurement table’
36, ‘ratio #ofPatients/#ofProviders is below threshold’ (indicates small or empty provider table)
Some are terminology dependent (eg, probing for erorrs in DOB):
Count of person over [threshold-child-age] is over [warning count-threshold]
For example, I know about a datasets (on purpose not mentioned) where there are patients over age 60 with a clearly pediatric diagnosis of ‘passing meconium’
(achilles_results_dist table provides many such candidates (analysis_id 406) where average or median age indicates a pediatric event but high value in value_max column indicates that there are outliers)
For achilles_share: I would expect several different outputs depending on the target audience. A data partner may be willing to share one set of parameters on a public internet page (achilles_share_level_1) and an a more aggressive set of parameters via an encrypted email to a possible study collaborator (achilles_share_level_x).
For achilles_share_level_1 (least aggressive) the possible output would be
size of dataset (classification into under 1M, 1-10M, 10+M, 100+M)
% of unmapped data (per domain such as Dx, Proc, Rx…)
what tables are fully populated (drug_cost?, location?,provider?, era tables?)
% of patients with some numerical measurement results (e.g., all iris measures, but on percentage basis)
I think the best way is to create some prototype outputs at various levels and let people discuss how many levels they like (maybe 2 levels is enough) and than move things up or down a level (or fuzz them less or more).
To continue the discussion (what I briefly mentioned on the last call).
Here is an analysis based on a comment from AMIA OHDSI panel.
For each “table” the query counts number of distinct source_values and target concept_ids. In a separate column, it reports how many distinct source_values are mapped to concept_id=0.
See example report here: (I also have results from one other site
(to continue discussion with mostly self (it was Martijn’s @schuemie idea to run new additions by the forum first) (I feared we may get no or few replies)
For Achilles Wiki page, it would be useful to display data by event type.
For example drug type and lack of drug type 38000180 (Inpatient administration Drug Type) may indicate only claim-based data.
Adding views like these below could also be added to Achilles Web. (or to the wiki (in % fashion in level1 that is least revealing)
--meas type
select stratum_2 as stratum_1, sum(count_value) as count_value from achilles_results where analysis_id = 1805 group by stratum_2;
--drug type
select stratum_2, sum(count_value) as count from achilles_results where analysis_id = 705 group by stratum_2;
--proc type
select stratum_2, sum(count_value) as count from achilles_results where analysis_id = 605 group by stratum_2;
--obs type
select stratum_2, sum(count_value) as count from achilles_results where analysis_id = 805 group by stratum_2;
Thank you for these great posts! Even if there are no immediate replies, it is helpful for new developers and implementers who are increasingly being steered toward these forums as a great resource in addition to the OHDSI website, wiki, github markdown files, etc. So…thank you again.