OHDSI Home | Forums | Wiki | Github

Achilles suggestion: table_name for record counts

Achilles collects stats of various types for various purposes, but one particular purpose I’m interested in is simple counts of records in a single table stratified by columns in that table (like analysis_ids 405, 605, 705, 805, 807, 1805, 1807, 2, 4, 5, 201, 301, 401, 501, 505, 601, 701, 801, 901, 1001,1201,1801). For these the strata names are generally identical to the column names they’re based on, and it would be really nice to also have the names of the tables the record counts come from.

Right now the only way to make use of a particular Achilles analysis is by a human knowing what the analysis represents (by reading the analysis_name, for instance.) If table_name were included, it would be possible to write code that retrieves counts for arbitrary table/column combinations.

On the other hand, maybe it would be better to handle basic record counts separately from the general abstractions of Achilles entirely. I’ve started doing that for the project I’m working on. My under-construction PostgreSQL code is at https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql.

To be a little clearer about what I’m doing for counts…

I create a list of all the CDM columns that have concept ids, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L6:

=> select * from concept_cols;

            tbl            |      table_name      |          column_name           | column_type
---------------------------+----------------------+--------------------------------+-------------
 cdm2.cohort_definition    | cohort_definition    | definition_type_concept_id     | type
 cdm2.cohort_definition    | cohort_definition    | subject_concept_id             | target
 cdm2.observation          | observation          | observation_concept_id         | target
 cdm2.observation          | observation          | observation_source_concept_id  | source
 cdm2.observation          | observation          | observation_type_concept_id    | type
 cdm2.observation          | observation          | qualifier_concept_id           | other
 ...

And I use that to populate a table of record counts per table/column/concept_id, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L70:

=> select * from concept_id_occurrence;
 schema | table_name  |          column_name          | column_type | concept_id |  count
--------+-------------+-------------------------------+-------------+------------+---------
 cdm2   | observation | observation_concept_id        | target      |    4275495 |  553962
 cdm2   | observation | observation_concept_id        | target      |    3040464 |   91141
 cdm2   | observation | observation_concept_id        | target      |    4219336 |  554134
 cdm2   | observation | observation_concept_id        | target      |   44813951 | 9235879
 cdm2   | observation | observation_concept_id        | target      |    4137274 | 9236123
 cdm2   | observation | observation_concept_id        | target      |    4005823 |  554065
 cdm2   | observation | observation_concept_id        | target      |    4145666 | 9235879
 cdm2   | observation | observation_source_concept_id | source      |   38001252 |     359
 cdm2   | observation | observation_source_concept_id | source      |   38000498 |       1
 cdm2   | observation | observation_source_concept_id | source      |   38001239 |      43
 cdm2   | observation | observation_source_concept_id | source      |   45416913 |       1
 cdm2   | observation | observation_source_concept_id | source      |   38001475 |       2
 cdm2   | observation | observation_source_concept_id | source      |   38000336 |       1
 cdm2   | observation | observation_source_concept_id | source      |   38001200 |     400
 ...

Then I can get counts by concept/table, https://github.com/Sigfried/vocab-pop/blob/master/vocab-pop-pg.sql#L165:

=> select * from concept_info_stats;

      table_name      |         column_name         |     domain_id      | vocabulary_id | concept_class_id  | sc | invalid | conceptrecs |  dbrecs
----------------------+-----------------------------+--------------------+---------------+-------------------+----+---------+-------------+----------
                      |                             | Measurement        | Read          | Read              |    | t       |           6 |
 care_site            | place_of_service_concept_id | Observation        | PCORNet       | Encounter Type    |    | f       |           1 |        4
 care_site            | place_of_service_concept_id | Observation        | PCORNet       | Undefined         |    | f       |           3 |      241
 care_site            | place_of_service_concept_id | Visit              | Visit         | Visit             | S  | f       |           4 |      637
 condition_occurrence | condition_concept_id        | Observation        | SNOMED        | Clinical Finding  | S  | f       |         378 |  1384568
 ...

or, for instance, by concept class, domain, vocabulary, etc.:

=> select vocabulary_id, concept_class_id, table_name, column_name, sum(dbrecs) as record_count from concept_info_stats where dbrecs > 0 and domain_id = 'Condition' group by 1,2,3,4 order by 1,2,3,4 ;
 vocabulary_id |   concept_class_id   |      table_name      |          column_name          | record_count
---------------+----------------------+----------------------+-------------------------------+--------------
 CIEL          | Diagnosis            | observation          | observation_source_concept_id |           58
 CIEL          | Diagnosis            | observation          | value_as_concept_id           |           58
 CIEL          | Finding              | observation          | observation_source_concept_id |           58
 CIEL          | Finding              | observation          | value_as_concept_id           |           58
 CIEL          | Symptom              | observation          | observation_source_concept_id |            6
 CIEL          | Symptom              | observation          | value_as_concept_id           |            6
 ICD10         | ICD10 code           | death                | cause_source_concept_id       |           34
 ICD10CM       | 3-char billing code  | condition_occurrence | condition_source_concept_id   |       124148
 ICD10CM       | 3-char nonbill code  | condition_occurrence | condition_source_concept_id   |            5
 ICD10CM       | 4-char billing code  | condition_occurrence | condition_source_concept_id   |       941711
 ICD10CM       | 4-char nonbill code  | condition_occurrence | condition_source_concept_id   |           13
 ICD10CM       | 5-char billing code  | condition_occurrence | condition_source_concept_id   |       525227
 ICD10CM       | 5-char nonbill code  | condition_occurrence | condition_source_concept_id   |            4
 ICD10CM       | 6-char billing code  | condition_occurrence | condition_source_concept_id   |       208622
 ICD10CM       | 7-char billing code  | condition_occurrence | condition_source_concept_id   |       155287
 ICD9CM        | 3-dig billing code   | condition_occurrence | condition_source_concept_id   |       402658
 ICD9CM        | 3-dig billing code   | death                | cause_source_concept_id       |            4
 ICD9CM        | 3-dig billing code   | observation          | observation_source_concept_id |           32
 ICD9CM        | 3-dig billing code   | observation          | value_as_concept_id           |           32
 ICD9CM        | 3-dig billing E code | condition_occurrence | condition_source_concept_id   |         1044
 ICD9CM        | 3-dig billing V code | condition_occurrence | condition_source_concept_id   |         1765
 ICD9CM        | 3-dig nonbill code   | condition_occurrence | condition_source_concept_id   |         7173
 ICD9CM        | 3-dig nonbill code   | observation          | observation_source_concept_id |          262
 ICD9CM        | 3-dig nonbill code   | observation          | value_as_concept_id           |          262
 ICD9CM        | 3-dig nonbill E code | condition_occurrence | condition_source_concept_id   |            3
 ICD9CM        | 3-dig nonbill V code | condition_occurrence | condition_source_concept_id   |            6
 ICD9CM        | 4-dig billing code   | condition_occurrence | condition_source_concept_id   |      6465741
 ICD9CM        | 4-dig billing code   | death                | cause_source_concept_id       |           78
 ICD9CM        | 4-dig billing E code | condition_occurrence | condition_source_concept_id   |        10735
 ICD9CM        | 4-dig billing V code | condition_occurrence | condition_source_concept_id   |       118406
 ICD9CM        | 4-dig nonbill code   | condition_occurrence | condition_source_concept_id   |        65531
 ICD9CM        | 4-dig nonbill V code | condition_occurrence | condition_source_concept_id   |          805
 ICD9CM        | 5-dig billing code   | condition_occurrence | condition_source_concept_id   |      8533996
 ICD9CM        | 5-dig billing code   | death                | cause_source_concept_id       |           80
 ICD9CM        | 5-dig billing V code | condition_occurrence | condition_source_concept_id   |       127749
 ICD9CM        | ICD9CM code          | condition_occurrence | condition_source_concept_id   |            1
 OXMIS         | OXMIS                | observation          | observation_source_concept_id |           73
 OXMIS         | OXMIS                | observation          | value_as_concept_id           |           73
 SMQ           | SMQ                  | observation          | observation_source_concept_id |           74
 SMQ           | SMQ                  | observation          | value_as_concept_id           |           74
 SNOMED        | Clinical Finding     | condition_occurrence | condition_concept_id          |     18482627
 SNOMED        | Clinical Finding     | death                | cause_concept_id              |          201
 SNOMED        | Clinical Finding     | observation          | observation_concept_id        |      1108027
 SNOMED        | Clinical Finding     | observation          | observation_source_concept_id |         1888
 SNOMED        | Clinical Finding     | observation          | value_as_concept_id           |      1626335

As I said, I know almost nothing about Achilles so far, so there may be nothing of interest here, but if I am understanding how Achilles handles simple record counts correctly, what I’m doing might be a better way to go.

t