OHDSI Home | Forums | Wiki | Github

RStudio Reponse/Freeze issues

Hello,

I have been trying to run R scripts to faciliate OHDSI researches. However, I have discovered that while some test runs (let’s say Achilles scripts), RStudio could freeze when running longer scripts.

For example, test 200 in Achilles

--TEST CODE 200
--HINT DISTRIBUTE_ON_KEY(stratum_1)
CREATE TEMP TABLE s_tmpach_200

AS
SELECT
200 AS analysis_id,
	CAST(vo.visit_concept_id AS VARCHAR(255)) AS stratum_1,
	CAST(NULL AS VARCHAR(255)) AS stratum_2,
	CAST(NULL AS VARCHAR(255)) AS stratum_3,
	CAST(NULL AS VARCHAR(255)) AS stratum_4,
	CAST(NULL AS VARCHAR(255)) AS stratum_5,
	COUNT(DISTINCT vo.person_id) AS count_value

FROM
ohdsiv5_june2021_thin.visit_occurrence vo
JOIN 
	ohdsiv5_june2021_thin.observation_period op 
ON 
	vo.person_id = op.person_id
AND 
	vo.visit_start_date >= op.observation_period_start_date
AND 
	vo.visit_start_date <= op.observation_period_end_date
GROUP BY 
	vo.visit_concept_id

would return no result in R studio.

However, if I put it in Python postreSQL runner with some settings

conn = psycopg2.connect(host=“someserver.hk”,
port=12345,
dbname=“ohdsi”,
user=“admin”,
password=“admin1”,
options="-c search_path="+schema,
# it seems the below lines are needed to keep the connection alive.
connect_timeout=10,
keepalives=1,
keepalives_idle=5,
keepalives_interval=2,
keepalives_count=5)

The concerned script #200 would seem to finish running successfully. The time needed to run the script is about 50 minutes.

C:\.....Programs\Python\Python39\python.exe "D:/OHDSI-THIN/scripts/import_from_THIN/test query 200.py"
 10:37:43.539193 running script...
 11:27:45.922109 finished

Process finished with exit code 0

I am testing another script Achilles #117 that got stuck, as of now it is still running in Python. (2+ hours run time)

--TEST CODE 117
--HINT DISTRIBUTE_ON_KEY(stratum_1)
-- generating date key sequences in a cross-dialect compatible fashion
CREATE TEMP TABLE s_tmpach_117	

AS
WITH century   AS (SELECT  CAST('19' as TEXT) num union select '20' num), 
tens as (select '0' num union select '1' num union select '2' num union select '3' num union select '4' num union select '5' num union select '6' num union select '7' num union select '8' num union select '9' num),
ones as (select '0' num union select '1' num union select '2' num union select '3' num union select '4' num union select '5' num union select '6' num union select '7' num union select '8' num union select '9' num),
months as (select '01' as num union select '02' num union select '03' num union select '04' num union select '05' num union select '06' num union select '07' num union select '08' num union select '09' num union select '10' num union select '11' num union select '12' num),
date_keys as (select cast(concat(century.num, tens.num, ones.num,months.num) as int) obs_month from century cross join tens cross join ones cross join months)
 SELECT
117 as analysis_id,  
	CAST(t1.obs_month AS VARCHAR(255)) as stratum_1,
	cast(null as varchar(255)) as stratum_2, cast(null as varchar(255)) as stratum_3, cast(null as varchar(255)) as stratum_4, cast(null as varchar(255)) as stratum_5,
	COALESCE(COUNT(distinct op1.PERSON_ID),0) as count_value

FROM
date_keys t1
left join
  (select t2.obs_month, op2.*
    from ohdsiv5_june2021_thin.observation_period op2, date_keys t2
    where EXTRACT(YEAR FROM op2.observation_period_start_date)*100 + EXTRACT(MONTH FROM op2.observation_period_start_date) <= t2.obs_month
    and EXTRACT(YEAR FROM op2.observation_period_end_date)*100 + EXTRACT(MONTH FROM op2.observation_period_end_date) >= t2.obs_month
  ) op1 on op1.obs_month = t1.obs_month
group by t1.obs_month

It had been run successfully yesterday night after I left work. Too bad I forgot to put a timer to that.

Just interested, if anyone has any experience that Rstudio would become unresponsive running long calculations. How did you solve this?

I try running the scripts in another IDE, but with very little success. I have tried getting R scripts running in vscode but setting up that is a pain in the butt. ty

Before I leave work this Friday, it seems like the solution is to add the param at the end of the url

https://jdbc.postgresql.org/documentation/93/connect.html

&tcpKeepAlive=true

looks like it would do the work, since at the ‘backend’ it seems it is JDBC that makes the SQL connection. I am not sure.

I will let R run over the weekend.

Hi @lychenus13,

Usually, for long jobs, I launch the run using nohup Rscript CodeToRun.R & from the command line. Keeping Rstudio alive is something I have always avoided.

Thank you. Too bad there seems to be not much options in windows for nohup

But I am curious, why avoid keeping RStudio alive?

t