[PatientLevelPrediction] No data found when using getDbPlpData (RESOLVED)

Eric_Chou · January 12, 2017, 9:30pm

Hello,

I’ve been using the PatientLevelPrediction through working with the “Building patient-level predictive models” vignette listed on the repository’s README.md but I am running into some errors on step 3.3 of the vignette where the package attempts to extract PLP data form the server by calling getDbPlpData in R. The error appears to come from the package, not SQL.

To me it seems that I may be misunderstanding the construction of outcome data, as well as how it pertains to use of the VISIT_OCCURRENCE table. Some tweaks that had to be done was that we opted to not use an “oracleTempSchema” argument for the getDbPlpData function. Also, when populating our VISIT_OCCURRENCE table, all patients were given the visit_concept_id of 42898160 for “Long Term Care Visit”. Furthermore, instead of creating the “rehospitalization” table as seen in the vignette example, we opted to instead try building a model using two distinct cohorts for both the cohortTable argument and the outcomeTable argument.

Below is my input for the getDbPlpData function, as well as the covariates settings applied through the createCovariateSettings function.

covariateSettings <- createCovariateSettings(useCovariateDemographics = TRUE,
                                             useCovariateConditionOccurrence = TRUE,
                                             useCovariateConditionOccurrence365d = FALSE,
                                             useCovariateConditionOccurrence30d = TRUE,
                                             useCovariateConditionOccurrenceInpt180d = FALSE,
                                             useCovariateConditionEra = TRUE,
                                             useCovariateConditionEraEver = TRUE,
                                             useCovariateConditionEraOverlap = TRUE,
                                             useCovariateConditionGroup = TRUE,
                                             useCovariateDrugExposure = TRUE,
                                             useCovariateDrugExposure365d = FALSE,
                                             useCovariateDrugExposure30d = TRUE,
                                             useCovariateDrugEra = TRUE,
                                             useCovariateDrugEra365d = FALSE,
                                             useCovariateDrugEra30d = TRUE,
                                             useCovariateDrugEraOverlap = TRUE,
                                             useCovariateDrugEraEver = TRUE,
                                             useCovariateDrugGroup = TRUE,
                                             useCovariateProcedureOccurrence = TRUE,
                                             useCovariateProcedureOccurrence365d = FALSE,
                                             useCovariateProcedureOccurrence30d = TRUE,
                                             useCovariateProcedureGroup = TRUE,
                                             useCovariateObservation = TRUE,
                                             useCovariateObservation365d = FALSE,
                                             useCovariateObservation30d = TRUE,
                                             useCovariateObservationCount365d = FALSE,
                                             useCovariateMeasurement = FALSE,
                                             useCovariateMeasurement365d = FALSE,
                                             useCovariateMeasurement30d = FALSE,
                                             useCovariateMeasurementCount365d = FALSE,
                                             useCovariateMeasurementBelow = FALSE,
                                             useCovariateMeasurementAbove = FALSE,
                                             useCovariateConceptCounts = TRUE,
                                             useCovariateRiskScores = TRUE,
                                             useCovariateRiskScoresCharlson = TRUE,
                                             useCovariateRiskScoresDCSI = TRUE,
                                             useCovariateRiskScoresCHADS2 = TRUE,
                                             useCovariateRiskScoresCHADS2VASc = TRUE,
                                             useCovariateInteractionYear = FALSE,
                                             useCovariateInteractionMonth = FALSE,
                                             excludedCovariateConceptIds = c(),
                                             deleteCovariatesSmallCount = 100)

plpData <- getDbPlpData(connectionDetails = connectionDetails,
                        cdmDatabaseSchema = cdmDatabaseSchema,
                        cohortDatabaseSchema = cohortsDatabaseSchema,
                        cohortTable = "cohort",
                        cohortIds = 246866, # Validated Observations cohort
                        washoutWindow = 183,
                        useCohortEndDate = TRUE,
                        windowPersistence = 0,
                        covariateSettings = covariateSettings,
                        outcomeDatabaseSchema = cohortsDatabaseSchema,
                        outcomeTable = "cohort",
                        outcomeIds = 246865, # Fall Prediction cohort
                        firstOutcomeOnly = FALSE,
                        cdmVersion = cdmVersion)

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          3.2                         
year           2016                        
month          10                          
day            31                          
svn rev        71607                       
language       R                           
version.string R version 3.3.2 (2016-10-31)
nickname       Sincere Pumpkin Patch

When running through R, the following output is given through the PatientLevelPrediction package, ultimately providing the warning / error messages of no PLP data being returned:

Constructing cohorts of interest
  |=====================================================================================================================================| 100%
Executing SQL took 0.417 secs
Fetching data from server
Loading took 0.124 secs
Constructing outcomes
  |=====================================================================================================================================| 100%
Executing SQL took 0.0755 secs
Fetching data from server
Loading took 0.273 secs
Constructing default covariates
  |=====================================================================================================================================| 100%
Executing SQL took 10.9 secs
Done
Fetching data from server
Loading took 0.237 secs
Removing redundant covariates
Removing redundant covariates took 0 secs
Normalizing covariates
Warning messages:
1: In lowLevelQuerySql.ffdf(connection, sql) :
  Data has zero rows, returning an empty data frame
2: In lowLevelQuerySql.ffdf(connection, sql) :
  Data has zero rows, returning an empty data frame
3: In getDbPlpData(connectionDetails = connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema,  :
  No outcome data found
4: In lowLevelQuerySql.ffdf(connection, sql) :
  Data has zero rows, returning an empty data frame
5: In PatientLevelPrediction::getDbCovariateData(connection = conn,  :
  No data found

Overall, I am stuck on how to address the 5 warning messages that come up saying there is no data found for PLP data. Any help or feedback would be greatly appreciated, and I would be happy to address any additional questions on how I have set up the tables, R functions, etc.

Thank you!

-Eric

rkboyce · December 19, 2016, 9:05pm

@Rijnbeek @schuemie @Patrick_Ryan – any help you can give Eric to this question will be greatly appreciated. This is related to our effort to compare a fall prediction algorithm that we just completed using a proprietary CART system with the one we plan to crate with PLP so we can report our experience to the group. thanks! -R

Rijnbeek · December 19, 2016, 9:20pm

hi rich,

The ‘cohort’ table is indeed empty? if so there might be a problem with your SQL that is creatingn the cohorts?

Peter

Verstuurd vanaf mijn iPhone

Eric_Chou · December 20, 2016, 5:50pm

Hello Peter,

Thank you for the response! To clarify, it isn’t the cohort table that is empty when this occurs. We created 2 cohort_definition_id’s, 246866 and 246866 that are used in the arguments for getDbPlpData for “cohortIds” and “outcomeIds” respectively. There are 6422 and 4238 entries for patients included in each of these cohorts. It seems that somewhere during the process of running the getDbPlpData command the data drops out and what is returned is ultimately empty, resulting in the warning messages, so we are having trouble figuring out why this is occurring.

-Eric

Rijnbeek · December 20, 2016, 7:51pm

Hi Eric,

I see. It seems to have a problem returning the covariates. I suppose it is not related to
deleteCovariatesSmallCount = 100? Although you do not have a very big dataset i guess some should be returned.

Just to be sure what happens if you set this to a much lower number?

Peter
Verstuurd vanaf mijn iPhone

schuemie · December 21, 2016, 8:40am

You specified a washout period of 183 days, meaning anyone with less than 183 days between the observation_period_start_date and the cohort_start_date are removed. Maybe you lost your subjects there? Could you verify any have >183 days of observation prior?

Eric_Chou · December 22, 2016, 6:12pm

Hello,

I’ve tried setting “deleteCovariatesSmallCount” to 10 and 0; both return the same set of 5 warnings.

-Eric

Eric_Chou · December 22, 2016, 8:18pm

Hi Martijn,

Thank you for the response. I looked through the subject_id’s included in our cohorts and crosschecked the cohort start & end dates for these subject_id’s with the corresponding person_id’s in our observation_period table. I can verify that there were patients who had their cohort_start_date start at least 183 days after their observation_period_start_date.

-Eric

schuemie · December 23, 2016, 2:07pm

The error messages indicate that no outcomes were found, and that no covariates were found. Could you check if at least some people at risk were found?

nrow(plpData$cohorts)

A count of 0 outcomes could be because only outcomes in the at-risk cohort are extracted. Maybe no one in the cohort has the outcome after the cohort start date/

A count of 0 covariates could be because no data is present for the cohort start date?

Eric_Chou · January 6, 2017, 9:06pm

I checked nrow(plpData$cohorts) and nrow(plpData$covariates) and indeed got 0’s for both.

In the data itself based on my cohort of cohort_definition_id = 246866 and outcomes of cohort_definition_id - 246865 (the outcome being falls), I went back to check on patients that overlap in both cohort and outcome sets. I queried these patients in the cohort who were also in the outcome set and found that many did have start dates for the outcome that came after the cohort start date. Is this what we are looking for when you say “the cohort has the outcome after the cohort start date”?

Eric_Chou · January 12, 2017, 9:29pm

Hello,

This issue with the getPlpData has been fixed with the latest commit (commit hash d248b8c) and we have been able to successfully acquire PLP Data with good face value for our desired cohort and outcome. Thanks for the help!

Going through the new vignette for Building Predictive Models that came with this new update, though, we did have a small issue with washoutPeriod when we moved onto implementing the createStudyPopulation command under step 4, “Applying additional inclusion criteria”. When we set washoutPeriod = 0 this command ran successfully, but at greater values, we got the error below:

Error in $<-.data.frame(*tmp*, “riskStart”, value = 1) :
replacement has 1 row, data has 0

Does anyone have any tips for what we should look into to our data to figure out how to approach using washoutPeriod?

Thanks!
-Eric

Chris_Knoll · January 12, 2017, 10:44pm

Washout period (i believe) just means that there’s X number of days prior to a person’s cohort start for the observation period the cohort start date belongs to. Specifically in SQL:

with ctePriorDays (person_id, daysToIndex) as 
(
  select c.person_id, datediff(d, op.observation_period_start_date, c.cohort_start_date) as daysToIndex
  from {cohortTable} c
  join {cdmSchema}.observation_period op on c.person_id = op.person_id 
    and c.cohort_start_date >= op.observation_period_start_date 
    and c.cohort_start_date <= op.observation_period_end_date
)
select MIN(pd.daysToIndex) as minDays, AVG(pd.daysToIndex) as avgDays, MAX(pd.daysToIndex) as maxDays
from ctePriorDays pd
group by pd.person_id

This assumes the following things:
You replace your cohort table with the placeholder in the sql: {cohortTable}
The columns in your cohort table is person_id, cohort_start_date, cohort_end_date
Replace {cdmSchema} with the schema of your CDM. It’s assumed that your cohort table is joinable to your cdm tables (as in: they exist in the same db).

The output will be the min, avg, and max washout days in your cohort. I’m guessing that if 0 is the only thing that worked, then your output will show something like 0’s for min/max/avg.

-Chris

Rijnbeek · January 13, 2017, 8:23pm

Thanks Chris for helping out.

The interpretation of washout is correct. I personally never really liked the name of this variable, i think it is a bit confusing. For me washout is more related to drugs washing out and ‘Run-in period’ is maybe more appropriate.

Peter

Mark_Danese · January 13, 2017, 9:08pm

Some suggestions on the terminology based on what we do. Feel free to take or leave as appropriate.

Lookback period is the time before the qualifying event date (or calendar date for prevalent studies).

An exclusion criterion during the lookback period effectively works a washout period. But it is more precise to call it an exclusion criterion assessed during a specific period of time. It could be for drugs, but it could also be for other exposures as well.