UPDATE: Below I describe two cases: an empty table, and a missing table. I overcame the issue with an empty table (it took 4 hours to finish that query), only the comment with missing table applies. (I contacted @Chris_Knoll separately to let him know.)
original post:
Thank you for your comment, @Chris_Knoll. I have to distinguish two things here:
An Empty Table:
Once I performed the step “A Missing Table” below, i.e. I created an empty table in the database I am getting stuck at 54% (when tables are indexed) with this query running and never finishing:
insert into CDM_GE.ACHILLES_results_dist (analysis_id, count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value)
SELECT 105 as analysis_id,
COUNT(count_value) as count_value,
min(count_value) as min_value,
max(count_value) as max_value,
avg(1.0*count_value) as avg_value,
STDDEV(count_value) as stdev_value,
max(case when p1<=0.50 then count_value else -9999 end) as median_value,
max(case when p1<=0.10 then count_value else -9999 end) as p10_value,
max(case when p1<=0.25 then count_value else -9999 end) as p25_value,
max(case when p1<=0.75 then count_value else -9999 end) as p75_value,
max(case when p1<=0.90 then count_value else -9999 end) as p90_value
FROM (SELECT ( op1.observation_period_end_date - op1.observation_period_start_date) as count_value,
1.0*(row_number() over (order by ( op1.observation_period_end_date -op1.observation_period_start_date)))
/(COUNT(*) over() + 1) as p1
FROM CDM_GE.PERSON p1
inner join
(SELECT person_id,
OBSERVATION_PERIOD_START_DATE,
OBSERVATION_PERIOD_END_DATE,
ROW_NUMBER() over (PARTITION by person_id order by observation_period_start_date asc) as rn1
FROM CDM_GE.OBSERVATION_PERIOD ) op1
on p1.PERSON_ID = op1.PERSON_ID
This is the query that is hanging in the air - in our settings, it joins an 18 million rows of one table with a 20 millions in the other, and I indexed them both as you advised me in the other thread. As none of the empty tables is reference here I would intuitively think the “emptiness” of those four tables does not matter. However, I do not know this for sure.
A Missing Table
I started with a databse with four missing (non-existing) tables. As long as they were not there, I was getting the following error, which disappeared once I created an empty table CARE_SITE in the database:
oracle
Error: execute JDBC update query failed in dbSendUpdate (ORA-00942: table or view does not exist)
SQL:
insert into CDM_GE.ACHILLES_results (analysis_id, count_value)SELECT 9 as analysis_id,
COUNT(p1.person_id) as count_value FROM CDM_GE.PERSON p1
left join CDM_GE.care_site cs1 on p1.care_site_id = cs1.care_site_id
WHERE p1.care_site_id is not null and cs1.care_site_id is null
R version:
R version 3.2.4 Revised (2016-03-16 r70336)
Platform:
x86_64-w64-mingw32
Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base
Other attached packages:
- Achilles (1.0)
- rjson (0.2.15)
- DatabaseConnector (1.4.0)
- RJDBC (0.2-5)
- DBI (0.4-1)
- SqlRender (1.1.3)
- httr (1.1.0)
- devtools (1.11.1)
- rJava (0.9-8)
I am sorry my answer took so long - I was initially trying to replicate this error, but in the end I found a stored error for a missing table message at my computer. The above message was obtained on May 10 and I’ve updated the R packages since then. For this reason you see old versions of packages in the log. I do not think it is very useful to replicate this with the newest version of Achilles - if I am wrong, let me know please.