Additions to Achilles

Vojtech_Huser · April 20, 2016, 3:33pm

In 2015, I organized a study that compared Heel Output at several sites. As a result of this study and Data Quality Hackathon (Data Quality research funded by PCORI), I would like to propose some changes to Achilles. In addition to existing parts, two new “components” would be added to Achilles. (see below)

Additions to architecture - (some new proposed)

achilles_pre-computations achilles_export-to-JSON
achilles_heel (set of rules with short output, use only pre-computated values)
achilles_DQ_reports (DQ output that does not fit well with Heel)
achilles_share (or achilles_wiki or achilles_mini) (mini-characterization of a database that is not sensitive)

Adding new content
Also, we would add new analyses to pre-computations (such as x21) 'count of distinct source_values that are currently mapped to concept 0 (for Dx, Proc, Rx, Meas, Obs).
I also propose to add Iris measures into Achilles (10,count of events; 11, count of patients with at least one Dx and one Proc; count of deceased patients).

The goal would also be to add to Achilles [Heel] some analyses that were implemented in an older tool called ‘grouch’.

To get a peak into beta versions of some of these, interested early testers can look into beta parts of Iris. (here: https://github.com/OHDSI/Iris/blob/master/extras/notes.md )

schuemie · April 20, 2016, 3:52pm

Thanks Vojtech!

Just to clarify a bit (after our telconf just now):

Currently, Achilles has these three components:

Achilles pre-computation which runs a long list of analyses to produce the achilles_result and achilles_result_dist tables
Achilles Heel, a set of rules with short outputs
Export to JSON producing a large set of JSON objects that can be explored using Achilles_Web

What Vojtech is proposing is to add two new components:

Achilles Data Quality reports containing information that goes beyond Achilles Heel. For example, this could include a report listing the source_codes that are unmapped (have concept ID = 0) and their frequency.
Export to Wiki or to some other format a small set of statistics that is deemed “safe to share” by everyone. One idea would be that every site in OHDSI use this function to generate a page in the OHDSI Wiki to describe their data. The idea would be that the Export-to-Wiki-function would also run against the pre-computed Achilles results, just like the Export-to-JSON-function.

@vojtech_huser: Could you maybe provide a full list of outputs that you propose go into the Export to Wiki?

Could you also provide the full list of additional quality rules that you would like to see added to Achilles Heel?

Vojtech_Huser · April 20, 2016, 8:15pm

Here would be a partial list of proposed added Heel rules (all are warnings (safer for consensus)) (listing them with rule_id used in the beta implementation inside Iris [beta parts])

27,‘percentage of unmapped rows (concept_id 0) is over warning threshold’
34,‘all rows in measurement table have null time component (likely “claim-ish” only lab data) (no real EHR data)’
35,‘thare are no numerical results in measurement table’
36, ‘ratio #ofPatients/#ofProviders is below threshold’ (indicates small or empty provider table)

Some are terminology dependent (eg, probing for erorrs in DOB):

Count of person over [threshold-child-age] is over [warning count-threshold]
For example, I know about a datasets (on purpose not mentioned) where there are patients over age 60 with a clearly pediatric diagnosis of ‘passing meconium’
(achilles_results_dist table provides many such candidates (analysis_id 406) where average or median age indicates a pediatric event but high value in value_max column indicates that there are outliers)

Vojtech_Huser · April 20, 2016, 8:16pm

To answer the second part of your question:

For achilles_share: I would expect several different outputs depending on the target audience. A data partner may be willing to share one set of parameters on a public internet page (achilles_share_level_1) and an a more aggressive set of parameters via an encrypted email to a possible study collaborator (achilles_share_level_x).

For achilles_share_level_1 (least aggressive) the possible output would be

size of dataset (classification into under 1M, 1-10M, 10+M, 100+M)
% of unmapped data (per domain such as Dx, Proc, Rx…)
what tables are fully populated (drug_cost?, location?,provider?, era tables?)
% of patients with some numerical measurement results (e.g., all iris measures, but on percentage basis)

I think the best way is to create some prototype outputs at various levels and let people discuss how many levels they like (maybe 2 levels is enough) and than move things up or down a level (or fuzz them less or more).

Vojtech_Huser · April 28, 2016, 9:38pm

To continue the discussion (what I briefly mentioned on the last call).
Here is an analysis based on a comment from AMIA OHDSI panel.
For each “table” the query counts number of distinct source_values and target concept_ids. In a separate column, it reports how many distinct source_values are mapped to concept_id=0.

See example report here: (I also have results from one other site

      DOMAIN TARGET_CNT SOURCE_CNT SOURCE_UNMAPPED_CNT
 observation       2693       3630                   0
        drug      15469     109632               25764
   condition       9271      12528                 194
   procedure      14506      18605                3988
 measurement       1513       1572                   0

The query can be found here:

github.com

OHDSI/Iris/blob/master/inst/sql/sql_server/iris_parameterized_2.sql

/*********************************************************************************
# Copyright 2016 Observational Health Data Sciences and Informatics
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
********************************************************************************/
/************************
last revised: April 2016  author:  Vojtech Huser
*************************/


--start of analysis
--analysis of how many source values are in source and into how many distinct concept_ids these are mapped
--and also including unmapped source_values

select 'measurement' as domain,
(select COUNT_BIG(distinct measurement_concept_id) from @cdmSchema.measurement) as target_cnt,
(select COUNT_BIG(distinct measurement_source_value) from @cdmSchema.measurement) as source_cnt,
(select COUNT_BIG(distinct measurement_source_value) from @cdmSchema.measurement where measurement_concept_id = 0) as source_unmapped_cnt

UNION

This file has been truncated. show original

Vojtech_Huser · May 12, 2016, 5:57pm

(to continue discussion with mostly self (it was Martijn’s @schuemie idea to run new additions by the forum first) (I feared we may get no or few replies)

For Achilles Wiki page, it would be useful to display data by event type.

For example drug type and lack of drug type 38000180 (Inpatient administration Drug Type) may indicate only claim-based data.

Adding views like these below could also be added to Achilles Web. (or to the wiki (in % fashion in level1 that is least revealing)

--meas type
select stratum_2 as stratum_1, sum(count_value) as count_value  from achilles_results where analysis_id = 1805 group by stratum_2;
--drug type
select stratum_2, sum(count_value) as count  from achilles_results where analysis_id = 705 group by stratum_2;
--proc type
select stratum_2, sum(count_value) as count  from achilles_results where analysis_id = 605 group by stratum_2;
--obs type
select stratum_2, sum(count_value) as count  from achilles_results where analysis_id = 805 group by stratum_2;

t_abdul_basser · May 12, 2016, 6:10pm

Thank you for these great posts! Even if there are no immediate replies, it is helpful for new developers and implementers who are increasingly being steered toward these forums as a great resource in addition to the OHDSI website, wiki, github markdown files, etc. So…thank you again.