R-studio stucks in the middle of package running without giving any error

Harsha_Ragyari · February 5, 2021, 10:22am

Hi people. Firstly, i would like to send gratitude to all the people involved in this forum especially developers for their contribution to solve the issues.
Everytime i run the code for PLE study package, when the program reaches the part “Running cohort method analysis, creating cohortmethod data objects” the program doesnt move(stucks at 0%) and after few hours it throws error “Error in unserialize(socklist[[2]]) : ‘connection’ must be a connection”. This error doesnt seem to be a mistake in code, but a system issue. Please give me clarification on this. Thanks

R version 3.5.3 (2019-03-11)
Platform:
x86_64-pc-linux-gnu
AWS-RAM-16gb

Regards,
HarshaVardhan

Adam_Black · February 5, 2021, 8:11pm

This error seems to have something to do with parallel computation. Maybe try setting maxCores = 1 in the execute() function.

Harsha_Ragyari · February 8, 2021, 3:39am

Thanks @Adam_Black for the reply. It seems to work partially but again after few lines of execution it got stucked at “constructing features on server” (20%). what exactly is the possible solution for this? how can we avoid it permanently? I would be much delighted if @schuemie chip into this. Thanks again. Love you all.

Adam_Black · February 11, 2021, 2:41pm

Debugging study packages can be difficult, and with limited information about the problem it is hard to determine it’s cause and solution. A few strategies I can suggest are

Try running FeatureExtraction on your CDM independent of the study package. Rule out problems with database access and the possibility that constructing features on your database is just very slow.
Double check the versions of the HADES packages required by the study package
Try running a different (minimal) PLE study. Determine if the issue happens with all PLE studies or is specific to your PLE study.
Finally you can try inserting a browser() function call or code breakpoint at the beginning of the execute() function and step through the code execution one step at a time to identify exactly where the issue occurs. It sounds like the problem is occurring with FeatureExtraction.

Harsha_Ragyari · April 5, 2021, 6:53am

Hi Adam,
Good Day! Thanks for showing interest on my concern. I tried the methods above as you mentioned, where i checked my HADES packages are upto date, tried with small PLE study with 1T,1C,1o and less TAR which worked pretty well and got the results. But when i run any complex analysis with big TAR, it is stalling at “constructing featured on server” phase when it reaches exactly 20%. why it is happening for only complex analysis with big TAR? i am using AWS linux with 4 cores and 16gb RAM. Please guide us how to proceed with this? and also could you please share your infrastructure details so that we know if we have to increase our AWS infrastructure specifications. Thanks a ton in advance. @Adam_Black
Regards,
HarshaVardhan

Adam_Black · April 5, 2021, 11:57am

Hi @Harsha_Ragyari,

Good day to you as well! Based on your description I don’t think you have ruled out the possibility that there is no error. The computation you are trying to run might just take a long time.

Try running your study on a smaller cdm dataset like 1,000 or 100,000 person sample of your data
Try letting the study run overnight to see if it passes the 20% mark

One thing to remember is that the progress bar is progress of SQL statements and each statement might take different amounts of time to run. There is no way for R to know how long a SQL statement will take to run and thus no way to show progress within a SQL query. It is not surprising that the progress bar gets stuck at 20% for a while when trying to run a complex study.

As for infrastructure requirements check out this post: Hardware specs to run OHDSI technology stack

Also Peter Rijnbeek recently conducted a survey: Inventory of OHDSI Infrastructures in the community: your input is needed!
I’m not sure if the results are ready to share yet but those might help.

schuemie · April 6, 2021, 6:32am

Also note that some of the queries fired at the server do not depend on the size of the cohorts. Especially those that group concepts into higher level concepts can take a long time because they touch a large part of the vocabulary, no matter how big the cohort.

Make sure that all recommended indices are created on the server. And as @Adam_Black mentioned, make sure the database server specs are sufficient. Running OHDSI analytics can require quite a bit of compute.

Harsha_Ragyari · April 12, 2021, 7:04am

Thanks @Adam_Black, @schuemie. your reply has given us respite. I have also shared my infrastructure details as well. Now i am looking to create index in the database to speed up the analysis, but the index link provided in thread incremental-achilles, doesn’t work unfortunately. would you please let us know more about creating index in the CDM database? Thanks again for the support. Take care.

Regards,
HarshaVardhan

Adam_Black · April 12, 2021, 11:58am

Hi @Harsha_Ragyari,

All of the DDL scripts are here (one folder for each database)

The indexes for SQL server are at https://github.com/OHDSI/CommonDataModel/blob/master/Sql%20Server/OMOP%20CDM%20sql%20server%20pk%20indexes.txt