OHDSI Home | Forums | Wiki | Github

Atlas concept record counts not refreshing

I’m running Atlas 2.12.1 with WebAPI version 2.13.0, but it seems the concept record counts are not refreshing after updating the source data and running Achilles. The source data is located in BigQuery, and results.achilles_result_concept_count is filled. When I run the following query, I get a positive number of records, but via this request, I get only zeros: http://host:port/WebAPI/cdmresults/cdm_pulse_01/conceptRecordCount

select *
from `results.achilles_result_concept_count`
where concept_id = '2000000322'

Also, for some other concept id’s, I’m receiving a 500 error when looking for record counts, e.g. concept_id = 8532 (female).

{"payload":{"cause":null,"stackTrace":[],"message":"An exception occurred: org.springframework.jdbc.UncategorizedSQLException","localizedMessage":"An exception occurred: org.springframework.jdbc.UncategorizedSQLException","suppressed":[]},"headers":{"id":"...","timestamp":...}}

Anyone who can tell me how I can get Atlas to show the actual record counts?

Let me provide some details about how WebAPI 2.13 will cache:

  1. Achilles runs and populates achilles_results
  2. You will need to run this script in order to refresh the achilles_result_concept_count, which you may already have done, but just in case you didn’t.
  3. WebAPI will read record counts from the webapi.cdm_cache table, and if it can’t find them, it will fetch them from results.achilles_result_concept_count and save those results into webapi.cdm_cache.

Therefore, if you have updated the results.achilles_result_concept_count table, then you should truncate your webapi.cdm_cache table for the given source_id. Ie: you can delete the records for the specific source with delete from webapi.cdm_cache where source_id = X or you can delete all cache records by truncate webapi.cdm_cache.

As of 2.13, it should not be necessary (as far as I understand) to restart WebAPI after you delete the records in webapi.cdm_cache since the WebAPI will fetch records from that table first (which is fast) and any concepts that are not found in cdm_cache will fetch from results.achilles_result_concept_count on your CDM source. Any concepts not found from the CDM source will insert zeros into the webapi.cdm_cache table so that we don’t keep fetching zeros.

Thanks for your reply! I refreshed the achilles_result_concept_count and truncated the webapi.cdm_cache table. However, instead of zeros I now get a “Failed to load data message”. To be sure I restarted the webapi container using docker restart ohdsi-webapi (I’m using broadsea), but still no success.

When I check devtools, I get a 500 error on the http://host:port/WebAPI/cdmresults/cdm_pulse_01/conceptRecordCount request: {"payload":{"cause":null,"stackTrace":[],"message":"An exception occurred: javax.ws.rs.NotAllowedException","localizedMessage":"An exception occurred: javax.ws.rs.NotAllowedException","suppressed":[]},"headers":{"id":"...","timestamp":...}}

Am I still missing anything?

Are you accessing that method via a browser URL? Because the NotAllowedException usually indicates a HTTP GET when it only allows a HTTP POST. The getConceptRecordCount accepts a POST of a JSON array of concept IDs.

You can check your WebAPI logs to see if you get a more detailed exception. Maybe something is happening under the covers that results in the above exception to the UI.

That was a good suggestion. Looks like there is a bug somewhere in the queries to update the counts data.

Caused by: java.sql.SQLException: [Simba][BigQueryJDBCDriver](100032) Error executing query job. Message: 400 Bad Request
POST https://bigquery.googleapis.com/bigquery/v2/projects/<project>/jobs
{
  "code": 400,
  "errors": [
    {
      "domain": "global",
      "location": "q",
      "locationType": "parameter",
      "message": "Unrecognized name: person_count at [1:59]",
      "reason": "invalidQuery"
    }
  ],
  "message": "Unrecognized name: person_count at [1:59]",
  "status": "INVALID_ARGUMENT"
}

I think it expects a column person_count in achilles_result_concept_count, which isn’t there. I updated achilles_result_concept_count using the R Achilles function: https://github.com/OHDSI/Achilles/blob/main/R/Achilles.R

However, I notice the resulting query is different from the script you sent: https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/ddl/achilles/achilles_result_concept_count.sql. Do you know why this is? It seems a bit pointless to rerun this script manually, instead of just including it in Achilles? However, I’ve modified the script to work for BigQuery, and ran it. And for the first few concepts I’ve checked, it seems to be working, so that’s great! Thanks again for your help!

/************************************************/
/***** Create record and person count table *****/
/************************************************/

DROP TABLE IF EXISTS @results_schema.achilles_result_concept_count;

CREATE TABLE @results_schema.achilles_result_concept_count
(
  concept_id                int,
  record_count              bigint,
  descendant_record_count   bigint,
  person_count              bigint,
  descendant_person_count   bigint
);

/**********************************************/
/***** Populate record/person count table *****/
/**********************************************/
INSERT INTO @results_schema.achilles_result_concept_count (concept_id, record_count, descendant_record_count, person_count, descendant_person_count)
WITH counts AS (
  SELECT stratum_1 concept_id, MAX (count_value) agg_count_value
  FROM @results_schema.achilles_results
  WHERE analysis_id IN (2, 4, 5, 201, 225, 301, 325, 401, 425, 501, 505, 525, 601, 625, 701, 725, 801, 825,
    826, 827, 901, 1001, 1201, 1203, 1425, 1801, 1825, 1826, 1827, 2101, 2125, 2301)
    /* analyses:
          Number of persons by gender
         Number of persons by race
         Number of persons by ethnicity
         Number of visit occurrence records, by visit_concept_id
         Number of visit_occurrence records, by visit_source_concept_id
         Number of providers by specialty concept_id
         Number of provider records, by specialty_source_concept_id
         Number of condition occurrence records, by condition_concept_id
         Number of condition_occurrence records, by condition_source_concept_id
         Number of records of death, by cause_concept_id
         Number of death records, by death_type_concept_id
         Number of death records, by cause_source_concept_id
         Number of procedure occurrence records, by procedure_concept_id
         Number of procedure_occurrence records, by procedure_source_concept_id
         Number of drug exposure records, by drug_concept_id
         Number of drug_exposure records, by drug_source_concept_id
         Number of observation occurrence records, by observation_concept_id
         Number of observation records, by observation_source_concept_id
         Number of observation records, by value_as_concept_id
         Number of observation records, by unit_concept_id
         Number of drug era records, by drug_concept_id
         Number of condition era records, by condition_concept_id
         Number of visits by place of service
         Number of visit_occurrence records, by discharge_to_concept_id
         Number of payer_plan_period records, by payer_source_concept_id
         Number of measurement occurrence records, by observation_concept_id
         Number of measurement records, by measurement_source_concept_id
         Number of measurement records, by value_as_concept_id
         Number of measurement records, by unit_concept_id
         Number of device exposure records, by device_concept_id
         Number of device_exposure records, by device_source_concept_id
         Number of location records, by region_concept_id
    */
  GROUP BY stratum_1
  UNION ALL
  SELECT stratum_2 concept_id, SUM (count_value) AS agg_count_value
  FROM @results_schema.achilles_results
  WHERE analysis_id IN (405, 605, 705, 805, 807, 1805, 1807, 2105)
    /* analyses:
         Number of condition occurrence records, by condition_concept_id by condition_type_concept_id
         Number of procedure occurrence records, by procedure_concept_id by procedure_type_concept_id
         Number of drug exposure records, by drug_concept_id by drug_type_concept_id
         Number of observation occurrence records, by observation_concept_id by observation_type_concept_id
         Number of observation occurrence records, by observation_concept_id and unit_concept_id
         Number of observation occurrence records, by measurement_concept_id by measurement_type_concept_id
         Number of measurement occurrence records, by measurement_concept_id and unit_concept_id
         Number of device exposure records, by device_concept_id by device_type_concept_id
        but this subquery only gets the type or unit concept_ids, i.e., stratum_2
    */
  GROUP BY stratum_2
), counts_person AS (
  SELECT stratum_1 as concept_id, MAX (count_value) agg_count_value
  FROM @results_schema.achilles_results
  WHERE analysis_id IN (200, 240, 400, 440, 540, 600, 640, 700, 740, 800, 840, 900, 1000, 1300, 1340, 1800, 1840, 2100, 2140, 2200)
    /* analyses:
        Number of persons with at least one visit occurrence, by visit_concept_id
        Number of persons with at least one visit occurrence, by visit_source_concept_id
        Number of persons with at least one condition occurrence, by condition_concept_id
        Number of persons with at least one condition occurrence, by condition_source_concept_id
        Number of persons with death, by cause_source_concept_id
        Number of persons with at least one procedure occurrence, by procedure_concept_id
        Number of persons with at least one procedure occurrence, by procedure_source_concept_id
        Number of persons with at least one drug exposure, by drug_concept_id
        Number of persons with at least one drug exposure, by drug_source_concept_id
        Number of persons with at least one observation occurrence, by observation_concept_id
        Number of persons with at least one observation occurrence, by observation_source_concept_id
        Number of persons with at least one drug era, by drug_concept_id
        Number of persons with at least one condition era, by condition_concept_id
        Number of persons with at least one visit detail, by visit_detail_concept_id
        Number of persons with at least one visit detail, by visit_detail_source_concept_id
        Number of persons with at least one measurement occurrence, by measurement_concept_id
        Number of persons with at least one measurement occurrence, by measurement_source_concept_id
        Number of persons with at least one device exposure, by device_concept_id
        Number of persons with at least one device exposure, by device_source_concept_id
        Number of persons with at least one note by  note_type_concept_id
    */
  GROUP BY stratum_1
), concepts AS (
  select concept_id as ancestor_id, coalesce(cast(ca.descendant_concept_id as STRING), concept_id) as descendant_id
  from (
    select concept_id from counts
    UNION DISTINCT
    -- include any ancestor concept that has a descendant in counts
    select distinct cast(ancestor_concept_id as STRING) concept_id
    from counts c
    join @vocab_schema.concept_ancestor ca on cast(ca.descendant_concept_id as STRING) = c.concept_id
  ) c
  left join @vocab_schema.concept_ancestor ca on c.concept_id = cast(ca.ancestor_concept_id as STRING)
)
SELECT
    cast(concepts.ancestor_id as int) concept_id,
    coalesce(max(c1.agg_count_value), 0) record_count,
    coalesce(sum(c2.agg_count_value), 0) descendant_record_count,
    coalesce(max(c3.agg_count_value), 0) person_count,
    coalesce(sum(c4.agg_count_value), 0) descendant_person_count
FROM concepts
         LEFT JOIN counts c1 ON concepts.ancestor_id = c1.concept_id
         LEFT JOIN counts c2 ON concepts.descendant_id = c2.concept_id
         LEFT JOIN counts_person c3 ON concepts.ancestor_id = c3.concept_id
         LEFT JOIN counts_person c4 ON concepts.descendant_id = c4.concept_id
GROUP BY concepts.ancestor_id;

You ran the wrong script. The script found in the Achilles package does not contain the person_count column which is what Atlas wants. That’s why the script referenced points to the WebAPI repo.

Achilles didn’t update their record_count table, so we had to do it ourselves in WebAPI.

It makes sense to do it for WebAPI because Achilles doesn’t really care about record counts/person counts because that information serves a specific vocabulary search use-case in WebAPI.

So, I managed to update achilles_result_concept_count using the script you sent. However, it seems some record counts are still missing from there.

For example, I cannot find record counts for some of the episode_object_concept_id present in the source database. These concept id’s also don’t seem to be present in the achilles_results table. What is the logic behind which concept id’s are included? Or does WebAPI also have a different query to fill achilles_results?

If they aren’t in the achilles_results table, then there wasn’t an ‘achilles analysis’ that ran that produces those statistics. From the query I sent you, you can see the counts are comming from these analysises:

    /* analyses:
          Number of persons by gender
         Number of persons by race
         Number of persons by ethnicity
         Number of visit occurrence records, by visit_concept_id
         Number of visit_occurrence records, by visit_source_concept_id
         Number of providers by specialty concept_id
         Number of provider records, by specialty_source_concept_id
         Number of condition occurrence records, by condition_concept_id
         Number of condition_occurrence records, by condition_source_concept_id
         Number of records of death, by cause_concept_id
         Number of death records, by death_type_concept_id
         Number of death records, by cause_source_concept_id
         Number of procedure occurrence records, by procedure_concept_id
         Number of procedure_occurrence records, by procedure_source_concept_id
         Number of drug exposure records, by drug_concept_id
         Number of drug_exposure records, by drug_source_concept_id
         Number of observation occurrence records, by observation_concept_id
         Number of observation records, by observation_source_concept_id
         Number of observation records, by value_as_concept_id
         Number of observation records, by unit_concept_id
         Number of drug era records, by drug_concept_id
         Number of condition era records, by condition_concept_id
         Number of visits by place of service
         Number of visit_occurrence records, by discharge_to_concept_id
         Number of payer_plan_period records, by payer_source_concept_id
         Number of measurement occurrence records, by observation_concept_id
         Number of measurement records, by measurement_source_concept_id
         Number of measurement records, by value_as_concept_id
         Number of measurement records, by unit_concept_id
         Number of device exposure records, by device_concept_id
         Number of device_exposure records, by device_source_concept_id
         Number of location records, by region_concept_id
    */

And a second set of analyses for person counts.

the Achilles analysis doesn’t create statistics on episode_object_concept_id. The analyses listed above get counts for the normal domains, (condition, drug, procedure) but episode came later, and achilles doesn’t touch those tables.

WebAPI does not fill achilles_results, only achilles does that. The WebAPI script reads from the achilles_results to populate the record count table.

Makes sense. But what is the way to go then for creating record counts for concept id’s of other object types such as episodes?

We have an ETL in place to update the source data, but I’m not sure yet what steps follow after this to update all of the record counts? It seems to be scattered all over the place, and I haven’t found documentation on this yet.

You are allowed to insert additional records into achilles_result_concept_count if you want to add concept counts that you want to see in Atlas. Just remember that if you do modify the results.achilles_result_concept_count:

I don’t think the record counts are ‘all over the place’: the Achilles R package generates the summary statistics for concept_ids, and then the 'achilles_result_concept_count` script generates the record count by record/person and descendants. There is a caching layer that may make it feel ‘spread out’ so maybe that’s what you mean by ‘all over the place’, but you don’t have to go into 5 different packages to find out how the calculations are done.

Here’s the steps:

  1. ETL your data into the CDM schema
  2. Run Achilles to generate achilles_results in your results schema (same database as CDM schema)
  3. Execute the achilles_result_concept_count script to generate the record counts
  4. Optional: execute additional INSERT INTO statements to calculate record counts for episodes, etc into achilles_result_concept_count.

That gets your CDM schema set up, and starts your results schema. The remainder of the results schema comes from WebAPI DDL to construct the other results tables to store analytical results that are performed in WebAPI. This is described here.

@Chris_Knoll

I have a follow up question for truncating the webapi.cdm_cache table.

On the ATLAS Configuration page, there are two refresh/clear cache buttons

  1. Refresh cache for a given source
  2. Clear Server Cache

Will either one of them truncate the webapi.cdm_cache table? Thanks

Clear Server Cache does not. However, I traced through the code for the Refresh Cache button, and it triggers the warmCache job, which will attempt to refresh the cache records from achilles_record_count into the webAPI cache…so, I think that you can refresh record counts on the UI by clicking ‘refresh cache’.

From what I can tell, the action taken when you click refresh cache:

  1. A job is created to warm cache. Warm is invoked here.
  2. Warm cache will fetch a batch of concept IDs from achilles_result_concept_count in order to reload the record counts from the source table. See here.
  3. The concepts fetched in the batch will be saved to the webAPI cache. This will update any record that has changed, and ignore anything that hasn’t changed. See here.

So, I believe it is possible to refresh record counts just by clicking ‘refresh cache’ on the atlas UI.

Note, there is a configuration param that can disable this behavior, seen here. But, by default it is enabled, so the above should just work. Please report back if it doesn’t.

t