OHDSI Home | Forums | Wiki | Github

[PatientLevelPrediction] Default Server

@schuemie, @msuchard, or @Patrick_Ryan,

I’m trying to use the PatientLevelPrediction package, but I’m getting stuck at the extracting data from server step.

  cohortData <- getDbCohortData(
    connectionDetails = connDetails$dbms,
    cdmDatabaseSchema = "CDM.dbo",
    cohortDatabaseSchema = "SCRATCH.dbo",
    cohortTable = "COHORT", 
    cohortConceptIds = 1) 

My connDetails$dbms object is a createConnectionDetails() object of dbms="sql server".

I get the following error:

 Error in paste("jdbc:sqlserver://", server, ";integratedSecurity=true",  : 
  argument "server" is missing, with no default 

I don’t see any way I can pass it connDetails$server.

I also tried using a connect() object using connection over connectionDetails. But I get this error:

Error: 'dbGetQuery.ffdf' is not an exported object from 'namespace:DatabaseConnector' 

FYI, this cohort is 5,318,577 patients. @Frank mentioned to me that once there was a ffdf issue with too many rows.

Thank you!

At first glance, it appears that PatientLevelPrediction has not been updated to the new DatabaseConnector API. I’ll look into this when I get a chance … @schuemie will probably beat me to it.

best, M

Hi @ericaVoss

Might this have something to do with the fact that the parameter is connectionDetails, but you only provide it the dbms?:

 connectionDetails = connDetails$dbms,

ffdf indeed has a problem with too many rows: the number cannot exceed the max value of a 32-bit integer (2,147,483,647).

@schuemie

You mean like this?

connectionDetails = connDetails

I get this error:

 Error: 'dbGetQuery.ffdf' is not an exported object from 'namespace:DatabaseConnector' 

Or is there a way to provide $dbms and $server?

Yes, like that!

Now you’re hitting a bug in the package (dbGetQuery.ffdf has been deprecated in the DatabaseConnector packages). I just pushed a fix.

Boo, you forced me to upgrade R. I was on R version 3.1.1 (2014-07-10). I used installr. Then I practically updated every package, including the following . . . but I’m still getting the error. It could just be an updating R issue. :frowning:

  install.packages("devtools")
  library(devtools)
  install_github("ohdsi/SqlRender") 
  install_github("ohdsi/DatabaseConnector") 
  library(DatabaseConnector)
 Error: 'dbGetQuery.ffdf' is not an exported object from 'namespace:DatabaseConnector'

Sorry, @ericaVoss . There was no need to upgrade everything. @schuemie pushed a fix to PatientLevelPrediction

install_github("OHDSI/PatientLevelPrediction")

should suffice.

Took me a bit to get it working, got further along but now I’m getting this:

Connecting using SQL Server driver using Windows integrated security
Connecting using SQL Server driver using Windows integrated security
Executing multiple queries. This could take a while
  |=======================================================================================| 100%
Analysis took 3 secs
Fetching data from server
Error executing SQL: Error in setwd(dfile): cannot change working directory
 Error in value[[3L]](cond) : no loop for break/next, jumping to top level 

The error report looks like this:

DBMS:
sql server

Error:
cannot change working directory

SQL:
SELECT subject_id AS person_id, cohort_start_date, cohort_concept_id, DATEDIFF(DAY, cohort_start_date, cohort_end_date) AS time FROM #cohort_person ORDER BY person_id, cohort_start_date

R version:
R version 3.2.1 (2015-06-18)

@msuchard & @schuemie - Wondering if you two are thinking this one is user error or if there is a possible issue with the package.

Hi @ericaVoss

This looks like a known problem with the ff package. As you know, ff stores data on disk instead of keeping it in memory. At the start of an R session it creates a temp folder where the ff data objects live. But when you restart your R session while keeping your data environment (say, after a package rebuild), the temp folder is gone but ff is still pointing to it.

I always run this command before using anything related to ff:

  options(fftempdir = "s:/temp")

This forces the ff temp folder to be the one I specified. I was hoping ‘end-users’ such as yourself would never need this hack, but I guess I was wrong. Do you have any idea how this could have happened? Have you hit the ‘Build’ button in R-Studio?

@schuemie - I was able to get further along. I’m currently running the getDbCovariateData() function. I will let you know if I run into additional issues.

Thank you both for your support!

@schuemie & @msuchard,

I got to this part:

  covariateData <- getDbCovariateData(connDetails,
                                      cdmDatabaseSchema = cdmDatabaseSchema,
                                      cohortDatabaseSchema = resultsDatabaseSchema,
                                      cohortTable = cohortTable,
                                      cohortConceptIds = 1,
                                      covariateSettings = covariateSettings)

And I get the following after running on a 100K sample (my full cohorts are 3M and 9M):

Connecting using SQL Server driver using Windows integrated security
Executing multiple queries. This could take a while
  |=================================================================================================================================| 100%
Analysis took 3.64 hours
Fetching data from server
Error: 'dbGetQuery.ffdf' is not an exported object from 'namespace:DatabaseConnector'
> 

I tried also adding in the connDetails$dbms but this gives me:

Connecting using SQL Server driver using Windows integrated security
Error in paste("jdbc:sqlserver://", server, ";integratedSecurity=true",  : 
  argument "server" is missing, with no default

So then I tried connDetails$server and got:

Executing multiple queries. This could take a while
  |                                                                                                                                 |   0%Error executing SQL: Error in if (attr(connection, "dbms") == "redshift" & grepl("DROP TABLE IF EXISTS", : argument is of length zero

An error report has been created at  \\glaz/Epi_GLAz/Projects/Programs/errorReport.txt
Error in value[[3L]](cond) : no loop for break/next, jumping to top level

My guess is I don’t want $dbms or ‘$server’ but there is an issue with the ffdf thing coming out of getDbCovariateData. But I don’t understand why the 'namespace:DatabaseConnector' cares about the 'dbGetQuery.ffdf'.

I appreciate any thoughts you have and if you think it is user error let me know!

It looks like @schuemie or I missed a dangling reference to dbGetQuery.ffdf in the PatientLevelPrediction package. I just patched this. Execute:

devtools::install_github("OHDSI/PatientLevelPrediction)

to update.

1 Like

@msuchard & @schuemie,

Got a little further . . . now I’m in getDbOutcomeData()

  outcomeData <- getDbOutcomeData(connDetails,
                                  cdmDatabaseSchema = cdmDatabaseSchema,
                                  cohortDatabaseSchema = resultsDatabaseSchema,
                                  cohortTable = cohortTable,
                                  cohortConceptIds = 1,
                                  outcomeDatabaseSchema = resultsDatabaseSchema,
                                  outcomeTable = cohortTable,
                                  outcomeConceptIds = 10)

The error I’m getting now:

Connecting using SQL Server driver using Windows integrated security
Executing multiple queries. This could take a while
  |=================================================================================================================================| 100%
Analysis took 0.146 secs
Fetching data from server
Loading took 0.0468 secs
Warning messages:
1: In lowLevelQuerySql.ffdf(connection, sql) :
  Data has zero rows, returning an empty data frame
2: In getDbOutcomeData(connDetails, cdmDatabaseSchema = cdmDatabaseSchema,  :
  No outcome data found

I assume this program is looking for #cohort_outcome so tried rerunning my getDbCovariateData() just in case. Still get same error.

I thought maybe it was related to the issue above but if I understand that issue properly if you are using querySql.ffdf then you are fine - which to me it looks like it is.

Nope, it doesn’t use temp tables as input. The input cohorts and outcome cohort are all expected to exist in the table defined by the cohortTable variable. The outcome cohorts are expected to have ID 10. No outcomes are found for the people in the input cohorts, hence the warnings.

Could you manually check if this is correct? Something like

SELECT COUNT(*)
FROM cohortTable cohort
INNER JOIN cohortTable outcome
ON cohort.subject_id = outcome.subject_id
WHERE cohort.cohort_concept_id = 1
AND outcome.cohort_concept_id = 10
AND outcome.cohort_start_date >= cohort.cohort_start_date
AND outcome.cohort_start_date <= cohort.cohort_end_date;

where cohortTable is the name of your cohort table.

I get 0 rows and this is why:

AND outcome.cohort_start_date <= cohort.cohort_end_date

This is because my cohort.COHORT_END_DATE is NULL.

If I read more carefully I see you wrote “cohort_end_date, the end of the time period. Can be NULL for outcomes.” I’ll go back and set something.

Next one:

prediction <- predictProbabilities(model, parts[[2]]$cohortData, parts[[2]]$covariateData)

I get:

Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,  : 
  unable to open

I’m only running on a small data sample now, only 1000 persons with 6 outcomes. Don’t know if that might be the problem.

@schuemie & @msuchard - Do you think this is user error?

Looks like there’s a problem opening the ff objects. If you type

parts[[2]]$cohortData
parts[[2]]$covariateData

Do these two commands generate errors? Did you use options(fftempdir = "s:/temp") to prevent the ff temp folder from becoming invalid on an R-within-RStudio restart?

> parts[[2]]$cohortData
CohortData object

Cohort of interest concept ID(s): 1
> parts[[2]]$covariateData
CovariateData object

Cohort of interest concept ID(s): 1

prediction <- predictProbabilities(model, parts[[2]]$cohortData, parts[[2]]$covariateData)
Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,  : 
  unable to open

Yeah, I’m doing options(fftempdir = "S:/R/temp") at the start of the program and re-ran it prior to this run.

t