To be a little clearer about what I’m doing for counts…
I create a list of all the CDM columns that have concept ids, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L6:
=> select * from concept_cols;
tbl | table_name | column_name | column_type
---------------------------+----------------------+--------------------------------+-------------
cdm2.cohort_definition | cohort_definition | definition_type_concept_id | type
cdm2.cohort_definition | cohort_definition | subject_concept_id | target
cdm2.observation | observation | observation_concept_id | target
cdm2.observation | observation | observation_source_concept_id | source
cdm2.observation | observation | observation_type_concept_id | type
cdm2.observation | observation | qualifier_concept_id | other
...
And I use that to populate a table of record counts per table/column/concept_id, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L70:
=> select * from concept_id_occurrence;
schema | table_name | column_name | column_type | concept_id | count
--------+-------------+-------------------------------+-------------+------------+---------
cdm2 | observation | observation_concept_id | target | 4275495 | 553962
cdm2 | observation | observation_concept_id | target | 3040464 | 91141
cdm2 | observation | observation_concept_id | target | 4219336 | 554134
cdm2 | observation | observation_concept_id | target | 44813951 | 9235879
cdm2 | observation | observation_concept_id | target | 4137274 | 9236123
cdm2 | observation | observation_concept_id | target | 4005823 | 554065
cdm2 | observation | observation_concept_id | target | 4145666 | 9235879
cdm2 | observation | observation_source_concept_id | source | 38001252 | 359
cdm2 | observation | observation_source_concept_id | source | 38000498 | 1
cdm2 | observation | observation_source_concept_id | source | 38001239 | 43
cdm2 | observation | observation_source_concept_id | source | 45416913 | 1
cdm2 | observation | observation_source_concept_id | source | 38001475 | 2
cdm2 | observation | observation_source_concept_id | source | 38000336 | 1
cdm2 | observation | observation_source_concept_id | source | 38001200 | 400
...
Then I can get counts by concept/table, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L165:
=> select * from concept_info_stats;
table_name | column_name | domain_id | vocabulary_id | concept_class_id | sc | invalid | conceptrecs | dbrecs
----------------------+-----------------------------+--------------------+---------------+-------------------+----+---------+-------------+----------
| | Measurement | Read | Read | | t | 6 |
care_site | place_of_service_concept_id | Observation | PCORNet | Encounter Type | | f | 1 | 4
care_site | place_of_service_concept_id | Observation | PCORNet | Undefined | | f | 3 | 241
care_site | place_of_service_concept_id | Visit | Visit | Visit | S | f | 4 | 637
condition_occurrence | condition_concept_id | Observation | SNOMED | Clinical Finding | S | f | 378 | 1384568
...
or, for instance, by concept class, domain, vocabulary, etc.:
=> select vocabulary_id, concept_class_id, table_name, column_name, sum(dbrecs) as record_count from concept_info_stats where dbrecs > 0 and domain_id = 'Condition' group by 1,2,3,4 order by 1,2,3,4 ;
vocabulary_id | concept_class_id | table_name | column_name | record_count
---------------+----------------------+----------------------+-------------------------------+--------------
CIEL | Diagnosis | observation | observation_source_concept_id | 58
CIEL | Diagnosis | observation | value_as_concept_id | 58
CIEL | Finding | observation | observation_source_concept_id | 58
CIEL | Finding | observation | value_as_concept_id | 58
CIEL | Symptom | observation | observation_source_concept_id | 6
CIEL | Symptom | observation | value_as_concept_id | 6
ICD10 | ICD10 code | death | cause_source_concept_id | 34
ICD10CM | 3-char billing code | condition_occurrence | condition_source_concept_id | 124148
ICD10CM | 3-char nonbill code | condition_occurrence | condition_source_concept_id | 5
ICD10CM | 4-char billing code | condition_occurrence | condition_source_concept_id | 941711
ICD10CM | 4-char nonbill code | condition_occurrence | condition_source_concept_id | 13
ICD10CM | 5-char billing code | condition_occurrence | condition_source_concept_id | 525227
ICD10CM | 5-char nonbill code | condition_occurrence | condition_source_concept_id | 4
ICD10CM | 6-char billing code | condition_occurrence | condition_source_concept_id | 208622
ICD10CM | 7-char billing code | condition_occurrence | condition_source_concept_id | 155287
ICD9CM | 3-dig billing code | condition_occurrence | condition_source_concept_id | 402658
ICD9CM | 3-dig billing code | death | cause_source_concept_id | 4
ICD9CM | 3-dig billing code | observation | observation_source_concept_id | 32
ICD9CM | 3-dig billing code | observation | value_as_concept_id | 32
ICD9CM | 3-dig billing E code | condition_occurrence | condition_source_concept_id | 1044
ICD9CM | 3-dig billing V code | condition_occurrence | condition_source_concept_id | 1765
ICD9CM | 3-dig nonbill code | condition_occurrence | condition_source_concept_id | 7173
ICD9CM | 3-dig nonbill code | observation | observation_source_concept_id | 262
ICD9CM | 3-dig nonbill code | observation | value_as_concept_id | 262
ICD9CM | 3-dig nonbill E code | condition_occurrence | condition_source_concept_id | 3
ICD9CM | 3-dig nonbill V code | condition_occurrence | condition_source_concept_id | 6
ICD9CM | 4-dig billing code | condition_occurrence | condition_source_concept_id | 6465741
ICD9CM | 4-dig billing code | death | cause_source_concept_id | 78
ICD9CM | 4-dig billing E code | condition_occurrence | condition_source_concept_id | 10735
ICD9CM | 4-dig billing V code | condition_occurrence | condition_source_concept_id | 118406
ICD9CM | 4-dig nonbill code | condition_occurrence | condition_source_concept_id | 65531
ICD9CM | 4-dig nonbill V code | condition_occurrence | condition_source_concept_id | 805
ICD9CM | 5-dig billing code | condition_occurrence | condition_source_concept_id | 8533996
ICD9CM | 5-dig billing code | death | cause_source_concept_id | 80
ICD9CM | 5-dig billing V code | condition_occurrence | condition_source_concept_id | 127749
ICD9CM | ICD9CM code | condition_occurrence | condition_source_concept_id | 1
OXMIS | OXMIS | observation | observation_source_concept_id | 73
OXMIS | OXMIS | observation | value_as_concept_id | 73
SMQ | SMQ | observation | observation_source_concept_id | 74
SMQ | SMQ | observation | value_as_concept_id | 74
SNOMED | Clinical Finding | condition_occurrence | condition_concept_id | 18482627
SNOMED | Clinical Finding | death | cause_concept_id | 201
SNOMED | Clinical Finding | observation | observation_concept_id | 1108027
SNOMED | Clinical Finding | observation | observation_source_concept_id | 1888
SNOMED | Clinical Finding | observation | value_as_concept_id | 1626335
As I said, I know almost nothing about Achilles so far, so there may be nothing of interest here, but if I am understanding how Achilles handles simple record counts correctly, what I’m doing might be a better way to go.