OHDSI Home | Forums | Wiki | Github

Achilles Heel initial discussion

I would like to find out from people what other functionality they would like to see in evaluating the quality of a given dataset.
Achilles provides a nice way of checking for some via the part called Achilles Heel.

First, for many existing quality checks, I would like to see a more detailed description.
Also, it would be nice to somehow define a cohort of “strange persons” that fail some of the important quality check. (e.g., events present prior birth date) so that our research analysis don’t include such invalid patients. (“typo patients”)

I first struggled with just a list of existing quality checks.
For the benefit of the group (and for me to understand it better), I parsed the Achilles.sql file and extracted the short notes about each heel ID report. (see the second post in this thread for this list).

Could we include in Achilles set of install files some sort of .CSV file with all quality rules listed and some notes like - why created, by whom, how are they checked - which tables are used (for those who are not expert in reverse engineering/“reading” SQL code

Here is the promised list of extracted notes for each report.

–000. PERSON statistics
–100. OBSERVATION_PERIOD (joined to PERSON)
–200- VISIT_OCCURRENCE
–300- PROVIDER
–400- CONDITION_OCCURRENCE
–500- DEATH
–600- PROCEDURE_OCCURRENCE
–700- DRUG_EXPOSURE
–800- OBSERVATION
–900- DRUG_ERA
–1000- CONDITION_ERA
–1100- LOCATION
–1200- CARE_SITE
–1300- ORGANIZATION
–1400- PAYOR_PLAN_PERIOD
–1500- DRUG_COST
–1600- PROCEDURE_COST
–1700- COHORT
–{0
– 0 Number of persons

–{1
– 1 Number of persons

–{2
– 2 Number of persons by gender

–{3
– 3 Number of persons by year of birth

–{4
– 4 Number of persons by race

–{5
– 5 Number of persons by ethnicity

–{7
– 7 Number of persons with invalid provider_id

–{8
– 8 Number of persons with invalid location_id

–{9
– 9 Number of persons with invalid care_site_id

–{101
– 101 Number of persons by age, with age at first observation period

–{102
– 102 Number of persons by gender by age, with age at first observation period

–{103
– 103 Distribution of age at first observation period

–{104
– 104 Distribution of age at first observation period by gender

–{105
– 105 Length of observation (days) of first observation period

–{106
– 106 Length of observation (days) of first observation period by gender

–{107
– 107 Length of observation (days) of first observation period by age decile

–{108
– 108 Number of persons by length of observation period, in 30d increments

–{109
– 109 Number of persons with continuous observation in each year
– Note: using temp table instead of nested query because this gives vastly improved performance in Oracle
USE @results_schema; --Set to result schema so temp tables are created where user has write rights (Oracle)
IF OBJECT_ID(‘temp_dates’, ‘U’) IS NOT NULL --This should only do something in Oracle

–{110
– 110 Number of persons with continuous observation in each month
– Note: using temp table instead of nested query because this gives vastly improved performance in Oracle
USE @results_schema; --Set to result schema so temp tables are created where user has write rights (Oracle)
IF OBJECT_ID(‘temp_dates’, ‘U’) IS NOT NULL --This should only do something in Oracle

–{111
– 111 Number of persons by observation period start month

–{112
– 112 Number of persons by observation period end month

–{113
– 113 Number of persons by number of observation periods

–{114
– 114 Number of persons with observation period before year-of-birth

–{115
– 115 Number of persons with observation period end < start

–{116
– 116 Number of persons with at least one day of observation in each year by gender and age decile

–{117
– 117 Number of persons with at least one day of observation in each year by gender and age decile

–{200
– 200 Number of persons with at least one visit occurrence, by visit_concept_id

–{201
– 201 Number of visit occurrence records, by visit_concept_id

–{202
– 202 Number of persons by visit occurrence start month, by visit_concept_id

–{203
– 203 Number of distinct visit occurrence concepts per person

–{204
– 204 Number of persons with at least one visit occurrence, by visit_concept_id by calendar year by gender by age decile

–{206
– 206 Distribution of age by visit_concept_id

–{207
–207 Number of visit records with invalid person_id

–{208
–208 Number of visit records outside valid observation period

–{209
–209 Number of visit records with end date < start date

–{210
–210 Number of visit records with invalid care_site_id

–{211
– 211 Distribution of length of stay by visit_concept_id

–{220
– 220 Number of visit occurrence records by condition occurrence start month

–{300
– 300 Number of providers

–{301
– 301 Number of providers by specialty concept_id

–{302
– 302 Number of providers with invalid care site id

–{400
– 400 Number of persons with at least one condition occurrence, by condition_concept_id

–{401
– 401 Number of condition occurrence records, by condition_concept_id

–{402
– 402 Number of persons by condition occurrence start month, by condition_concept_id

–{403
– 403 Number of distinct condition occurrence concepts per person

–{404
– 404 Number of persons with at least one condition occurrence, by condition_concept_id by calendar year by gender by age decile

–{405
– 405 Number of condition occurrence records, by condition_concept_id by condition_type_concept_id

–{406
– 406 Distribution of age by condition_concept_id

–{409
– 409 Number of condition occurrence records with invalid person_id

–{410
– 410 Number of condition occurrence records outside valid observation period

–{411
– 411 Number of condition occurrence records with end date < start date

–{412
– 412 Number of condition occurrence records with invalid provider_id

–{413
– 413 Number of condition occurrence records with invalid visit_id

–{420
– 420 Number of condition occurrence records by condition occurrence start month

–{500
– 500 Number of persons with death, by cause_of_death_concept_id

–{501
– 501 Number of records of death, by cause_of_death_concept_id

–{502
– 502 Number of persons by condition occurrence start month

–{504
– 504 Number of persons with a death, by calendar year by gender by age decile

–{505
– 505 Number of death records, by death_type_concept_id

–{506
– 506 Distribution of age by condition_concept_id

–{509
– 509 Number of death records with invalid person_id

–{510
– 510 Number of death records outside valid observation period

–{511
– 511 Distribution of time from death to last condition

–{512
– 512 Distribution of time from death to last drug

–{513
– 513 Distribution of time from death to last visit

–{514
– 514 Distribution of time from death to last procedure

–{515
– 515 Distribution of time from death to last observation

–{600
– 600 Number of persons with at least one procedure occurrence, by procedure_concept_id

–{601
– 601 Number of procedure occurrence records, by procedure_concept_id

–{602
– 602 Number of persons by procedure occurrence start month, by procedure_concept_id

–{603
– 603 Number of distinct procedure occurrence concepts per person

–{604
– 604 Number of persons with at least one procedure occurrence, by procedure_concept_id by calendar year by gender by age decile

–{605
– 605 Number of procedure occurrence records, by procedure_concept_id by procedure_type_concept_id

–{606
– 606 Distribution of age by procedure_concept_id

–{609
– 609 Number of procedure occurrence records with invalid person_id

–{610
– 610 Number of procedure occurrence records outside valid observation period

–{612
– 612 Number of procedure occurrence records with invalid provider_id

–{613
– 613 Number of procedure occurrence records with invalid visit_id

–{620
– 620 Number of procedure occurrence records by condition occurrence start month

–{700
– 700 Number of persons with at least one drug occurrence, by drug_concept_id

–{701
– 701 Number of drug occurrence records, by drug_concept_id

–{702
– 702 Number of persons by drug occurrence start month, by drug_concept_id

–{703
– 703 Number of distinct drug exposure concepts per person

–{704
– 704 Number of persons with at least one drug occurrence, by drug_concept_id by calendar year by gender by age decile

–{705
– 705 Number of drug occurrence records, by drug_concept_id by drug_type_concept_id

–{706
– 706 Distribution of age by drug_concept_id

–{709
– 709 Number of drug exposure records with invalid person_id

–{710
– 710 Number of drug exposure records outside valid observation period

–{711
– 711 Number of drug exposure records with end date < start date

–{712
– 712 Number of drug exposure records with invalid provider_id

–{713
– 713 Number of drug exposure records with invalid visit_id

–{715
– 715 Distribution of days_supply by drug_concept_id

–{716
– 716 Distribution of refills by drug_concept_id

–{717
– 717 Distribution of quantity by drug_concept_id

–{720
– 720 Number of drug exposure records by condition occurrence start month

–{800
– 800 Number of persons with at least one observation occurrence, by observation_concept_id

–{801
– 801 Number of observation occurrence records, by observation_concept_id

–{802
– 802 Number of persons by observation occurrence start month, by observation_concept_id

–{803
– 803 Number of distinct observation occurrence concepts per person

–{804
– 804 Number of persons with at least one observation occurrence, by observation_concept_id by calendar year by gender by age decile

–{805
– 805 Number of observation occurrence records, by observation_concept_id by observation_type_concept_id

–{806
– 806 Distribution of age by observation_concept_id

–{807
– 807 Number of observation occurrence records, by observation_concept_id and unit_concept_id

–{809
– 809 Number of observation records with invalid person_id

–{810
– 810 Number of observation records outside valid observation period

–{812
– 812 Number of observation records with invalid provider_id

–{813
– 813 Number of observation records with invalid visit_id

–{814
– 814 Number of observation records with no value (numeric, string, or concept)

–{815
– 815 Distribution of numeric values, by observation_concept_id and unit_concept_id

–{816
– 816 Distribution of low range, by observation_concept_id and unit_concept_id

–{817
– 817 Distribution of high range, by observation_concept_id and unit_concept_id

–{818
– 818 Number of observation records below/within/above normal range, by observation_concept_id and unit_concept_id

–{820
– 820 Number of observation records by condition occurrence start month

–{900
– 900 Number of persons with at least one drug occurrence, by drug_concept_id

–{901
– 901 Number of drug occurrence records, by drug_concept_id

–{902
– 902 Number of persons by drug occurrence start month, by drug_concept_id

–{903
– 903 Number of distinct drug era concepts per person

–{904
– 904 Number of persons with at least one drug occurrence, by drug_concept_id by calendar year by gender by age decile

–{906
– 906 Distribution of age by drug_concept_id

–{907
– 907 Distribution of drug era length, by drug_concept_id

–{908
– 908 Number of drug eras with invalid person

–{909
– 909 Number of drug eras outside valid observation period

–{910
– 910 Number of drug eras with end date < start date

–{920
– 920 Number of drug era records by drug era start month

–{1000
– 1000 Number of persons with at least one condition occurrence, by condition_concept_id

–{1001
– 1001 Number of condition occurrence records, by condition_concept_id

–{1002
– 1002 Number of persons by condition occurrence start month, by condition_concept_id

–{1003
– 1003 Number of distinct condition era concepts per person

–{1004
– 1004 Number of persons with at least one condition occurrence, by condition_concept_id by calendar year by gender by age decile

–{1006
– 1006 Distribution of age by condition_concept_id

–{1007
– 1007 Distribution of condition era length, by condition_concept_id

–{1008
– 1008 Number of condition eras with invalid person

–{1009
– 1009 Number of condition eras outside valid observation period

–{1010
– 1010 Number of condition eras with end date < start date

–{1020
– 1020 Number of drug era records by drug era start month

–{1100
– 1100 Number of persons by location 3-digit zip

–{1101
– 1101 Number of persons by location state

–{1102
– 1102 Number of care sites by location 3-digit zip

–{1103
– 1103 Number of care sites by location state

–{1200
– 1200 Number of persons by place of service

–{1201
– 1201 Number of visits by place of service

–{1202
– 1202 Number of care sites by place of service

–{1300
– 1300 Number of organizations by place of service

–{1406
– 1406 Length of payer plan (days) of first payer plan period by gender

–{1407
– 1407 Length of payer plan (days) of first payer plan period by age decile

–{1408
– 1408 Number of persons by length of payer plan period, in 30d increments

–{1409
– 1409 Number of persons with continuous payer plan in each year
– Note: using temp table instead of nested query because this gives vastly improved
USE @results_schema; --Set to result schema so temp tables are created where user has write rights (Oracle)
IF OBJECT_ID(‘temp_dates’, ‘U’) IS NOT NULL --This should only do something in Oracle

–{1410
– 1410 Number of persons with continuous payer plan in each month
– Note: using temp table instead of nested query because this gives vastly improved performance in Oracle
USE @results_schema; --Set to result schema so temp tables are created where user has write rights (Oracle)
IF OBJECT_ID(‘temp_dates’, ‘U’) IS NOT NULL --This should only do something in Oracle

–{1411
– 1411 Number of persons by payer plan period start month

–{1412
– 1412 Number of persons by payer plan period end month

–{1413
– 1413 Number of persons by number of payer plan periods

–{1414
– 1414 Number of persons with payer plan period before year-of-birth

–{1415
– 1415 Number of persons with payer plan period end < start

–{1500
– 1500 Number of drug cost records with invalid drug exposure id

–{1501
– 1501 Number of drug cost records with invalid payer plan period id

–{1502
– 1502 Distribution of paid copay, by drug_concept_id

–{1503
– 1503 Distribution of paid coinsurance, by drug_concept_id

–{1504
– 1504 Distribution of paid toward deductible, by drug_concept_id

–{1505
– 1505 Distribution of paid by payer, by drug_concept_id

–{1506
– 1506 Distribution of paid by coordination of benefit, by drug_concept_id

–{1507
– 1507 Distribution of total out-of-pocket, by drug_concept_id

–{1508
– 1508 Distribution of total paid, by drug_concept_id

–{1509
– 1509 Distribution of ingredient_cost, by drug_concept_id

–{1510
– 1510 Distribution of dispensing fee, by drug_concept_id

–{1511
– 1511 Distribution of average wholesale price, by drug_concept_id

–{1600
– 1600 Number of procedure cost records with invalid procedure exposure id

–{1601
– 1601 Number of procedure cost records with invalid payer plan period id

–{1602
– 1602 Distribution of paid copay, by procedure_concept_id

–{1603
– 1603 Distribution of paid coinsurance, by procedure_concept_id

–{1604
– 1604 Distribution of paid toward deductible, by procedure_concept_id

–{1605
– 1605 Distribution of paid by payer, by procedure_concept_id

–{1606
– 1606 Distribution of paid by coordination of benefit, by procedure_concept_id

–{1607
– 1607 Distribution of total out-of-pocket, by procedure_concept_id

–{1608
– 1608 Distribution of total paid, by procedure_concept_id

–{1609
– 1609 Number of records by disease_class_concept_id

–{1610
– 1610 Number of records by revenue_code_concept_id

–{1700
– 1700 Number of records by cohort_concept_id

–{1701
– 1701 Number of records with cohort end date < cohort start date

t