Hello,
I have been trying to run R scripts to faciliate OHDSI researches. However, I have discovered that while some test runs (let’s say Achilles scripts), RStudio could freeze when running longer scripts.
For example, test 200 in Achilles
--TEST CODE 200
--HINT DISTRIBUTE_ON_KEY(stratum_1)
CREATE TEMP TABLE s_tmpach_200
AS
SELECT
200 AS analysis_id,
CAST(vo.visit_concept_id AS VARCHAR(255)) AS stratum_1,
CAST(NULL AS VARCHAR(255)) AS stratum_2,
CAST(NULL AS VARCHAR(255)) AS stratum_3,
CAST(NULL AS VARCHAR(255)) AS stratum_4,
CAST(NULL AS VARCHAR(255)) AS stratum_5,
COUNT(DISTINCT vo.person_id) AS count_value
FROM
ohdsiv5_june2021_thin.visit_occurrence vo
JOIN
ohdsiv5_june2021_thin.observation_period op
ON
vo.person_id = op.person_id
AND
vo.visit_start_date >= op.observation_period_start_date
AND
vo.visit_start_date <= op.observation_period_end_date
GROUP BY
vo.visit_concept_id
would return no result in R studio.
However, if I put it in Python postreSQL runner with some settings
conn = psycopg2.connect(host=“someserver.hk”,
port=12345,
dbname=“ohdsi”,
user=“admin”,
password=“admin1”,
options="-c search_path="+schema,
# it seems the below lines are needed to keep the connection alive.
connect_timeout=10,
keepalives=1,
keepalives_idle=5,
keepalives_interval=2,
keepalives_count=5)
The concerned script #200 would seem to finish running successfully. The time needed to run the script is about 50 minutes.
C:\.....Programs\Python\Python39\python.exe "D:/OHDSI-THIN/scripts/import_from_THIN/test query 200.py"
10:37:43.539193 running script...
11:27:45.922109 finished
Process finished with exit code 0
I am testing another script Achilles #117 that got stuck, as of now it is still running in Python. (2+ hours run time)
--TEST CODE 117
--HINT DISTRIBUTE_ON_KEY(stratum_1)
-- generating date key sequences in a cross-dialect compatible fashion
CREATE TEMP TABLE s_tmpach_117
AS
WITH century AS (SELECT CAST('19' as TEXT) num union select '20' num),
tens as (select '0' num union select '1' num union select '2' num union select '3' num union select '4' num union select '5' num union select '6' num union select '7' num union select '8' num union select '9' num),
ones as (select '0' num union select '1' num union select '2' num union select '3' num union select '4' num union select '5' num union select '6' num union select '7' num union select '8' num union select '9' num),
months as (select '01' as num union select '02' num union select '03' num union select '04' num union select '05' num union select '06' num union select '07' num union select '08' num union select '09' num union select '10' num union select '11' num union select '12' num),
date_keys as (select cast(concat(century.num, tens.num, ones.num,months.num) as int) obs_month from century cross join tens cross join ones cross join months)
SELECT
117 as analysis_id,
CAST(t1.obs_month AS VARCHAR(255)) as stratum_1,
cast(null as varchar(255)) as stratum_2, cast(null as varchar(255)) as stratum_3, cast(null as varchar(255)) as stratum_4, cast(null as varchar(255)) as stratum_5,
COALESCE(COUNT(distinct op1.PERSON_ID),0) as count_value
FROM
date_keys t1
left join
(select t2.obs_month, op2.*
from ohdsiv5_june2021_thin.observation_period op2, date_keys t2
where EXTRACT(YEAR FROM op2.observation_period_start_date)*100 + EXTRACT(MONTH FROM op2.observation_period_start_date) <= t2.obs_month
and EXTRACT(YEAR FROM op2.observation_period_end_date)*100 + EXTRACT(MONTH FROM op2.observation_period_end_date) >= t2.obs_month
) op1 on op1.obs_month = t1.obs_month
group by t1.obs_month
It had been run successfully yesterday night after I left work. Too bad I forgot to put a timer to that.
Just interested, if anyone has any experience that Rstudio would become unresponsive running long calculations. How did you solve this?
I try running the scripts in another IDE, but with very little success. I have tried getting R scripts running in vscode but setting up that is a pain in the butt. ty