OHDSI Home | Forums | Wiki | Github

Export to JSON stalling in ACHILLES

Hi All,

I am interested in finding out if anyone in the community has run into issues recently when running the Achilles JSON export function export2JSON() against result tables generated by Achilles from a CDM v5 database with latest vocabulary. We run the count generation and JSON export code as part of our OMOP CDM ETL processes and are experiencing very long-running (stalled?) queries. Some analysis shows that problem query is:

SELECT c1.concept_id AS concept_id,
c1.concept_name as concept_name,
cast(CAST(num_stratum_4 AS INT) * 10 AS VARCHAR) + ‘-’ + cast((CAST(num_stratum_4 AS INT) + 1) * 10 - 1 AS VARCHAR) AS trellis_name, --age decile
c2.concept_name AS series_name, --gender
num_stratum_2 AS x_calendar_year, – calendar year, note, there could be blanks
ROUND(1000 * (1.0 * num_count_value / denom_count_value), 5) AS y_prevalence_1000pp --prevalence, per 1000 persons
FROM (
SELECT num.stratum_1 AS num_stratum_1,
num.stratum_2 AS num_stratum_2,
num.stratum_3 AS num_stratum_3,
num.stratum_4 AS num_stratum_4,
num.count_value AS num_count_value,
denom.count_value AS denom_count_value
FROM (
SELECT *
FROM ohdsi_west_pending.results.ACHILLES_results
WHERE analysis_id = 404
AND stratum_3 IN (‘8507’, ‘8532’)
) num
INNER JOIN (
SELECT *
FROM ohdsi_west_pending.results.ACHILLES_results
WHERE analysis_id = 116
AND stratum_2 IN (‘8507’, ‘8532’)
) denom
ON num.stratum_2 = denom.stratum_1
AND num.stratum_3 = denom.stratum_2
AND num.stratum_4 = denom.stratum_3
) tmp
INNER JOIN ohdsi_west_pending.dbo.concept c1
ON num_stratum_1 = CAST(c1.concept_id as VARCHAR)
INNER JOIN ohdsi_west_pending.dbo.concept c2
ON num_stratum_3 = CAST(c2.concept_id as VARCHAR)
ORDER BY c1.concept_id,
num_stratum_2

Is any one else experiencing this? @erciavoss?

Hi there @karthik.

Well nothign strikes me as particularly stressful in those queries (except I’m not a fan of those inner select * expressions, I’d rather they explicitly state the columns returned), but otherwise:

All those casts in the select shouldn’t be such a problem since that’s just applied to the result…

The Inner num and denom are just fetching from achilles_results for the analysis_ids 404 and 116. That also shouldn’t be crazy since those are summary statistics and it should not be a lot of rows. For diagnostics, you could run ‘select analysis_id, count(*) from achilles_results where analysis_id in (116, 404) group by analysis_id’ and see how many rows we’re really talking about (shouldn’t be that many).

then once all those inners are produced it tries to match on the concept table casting the concept_id to a varchar. Now here could be a little bit of a problem: indexes aren’t effective if you cast away their value into another type, but I can’t be sure of that behavior without other diagnostics.

What DB platform are you runnign on, and is there any way to get a system-generated recommendation on where index might be needed/out of date based on the query plan that’s produced from your sample?

-Chris

Hi @Chris_Knoll

Thanks. I believe it might be a missing an index now that you mention it. Looking at the query more carefully, it is different from a previous query that we experienced performance issues with. The one I was thinking about was joining concept_ancestor. I’ll make this change and see if that resolves everything. Hopefully, it was just a simple oversight on my part!

Thanks,
k

One other tip: although we are writing to the achilles_results table, it may also make sense to index that table on analysis_id. Many of the analysis queries refer to other analysis queries, so creating an idnex on analysis_id and maybe secondary indexes on stratum_1 may help performance. But, your best bet is to examine a query plan and seek out those pesky table scans.

-Chris

When I ran JSON export, I ended up waiting 14 hours.

I killed first execution after 2 hours thinking something is wrong.
But one option is just to wait. (for 10+ hours).

1 Like

@karthik Are you using SQL Server?

Can you try changing the joins to cast the num_stratum cols to integer instead of casting the concept_ids to varchar?

I suspect the performance will be significantly better.

i.e. make the below change:

INNER JOIN ohdsi_west_pending.dbo.concept c1
ON CAST(num_stratum_1 as INT)  = c1.concept_id 
INNER JOIN ohdsi_west_pending.dbo.concept c2
ON CAST(num_stratum_3 as INT) = c2.concept_id
t