OHDSI Home | Forums | Wiki | Github

Empty cohort issue only occurred on KNN, Naive Bayes, Neural Network, DeepNN model

I tested patient level prediction package.

Empty cohort issue only occurred on KNN, naive bayes, neural network, deep NN model.
Decision tree, lasso logistic regression, ada boost, gradient boosting machine, random forest model work normally.

All models used default option.

The plplog of KNN model is as below.

|2019-12-17 10:53:45|[Main thread]|INFO|PatientLevelPrediction||Patient-Level Prediction Package version 3.0.8|
|2019-12-17 10:53:45|[Main thread]|INFO|PatientLevelPrediction|runPlpAnalyses|No plpData - probably empty cohort issue|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Patient-Level Prediction Package version 3.0.8|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||AnalysisID: Analysis_13|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||CohortID: 19|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||OutcomeID: 501|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Cohort size: 3080|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Covariates: 1621|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Population size: 3080|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Cases: 204|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||testSplit: subject|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||outcomeCount: 204|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||plpData class: plpData|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||testfraction: 0.25|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||nfold class: numeric|
|2019-12-17 13:05:06|[Main thread]|DEBUG|PatientLevelPrediction||nfold: 10|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||splitSeed: 1935504|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction|subjectSplitter|Creating a 25% test and 75% train (into 10 folds) stratified split by subject|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction|subjectSplitter|Data split into 768 test cases and 2312 train cases (232, 232, 232, 232, 232, 232, 232, 230, 230, 228)|
|2019-12-17 13:05:06|[Main thread]|INFO|PatientLevelPrediction||Training KNN model|

And used covariates are as below.

Demographics Gender
Demographics Prior Observation Time
Demographics Post Observation Time
Demographics Time In Cohort
Drug Era Long Term
Procedure Occurrence Long Term
Measurement Long Term
Measurement Value Long Term
Observation Long Term
Charlson Index
Dcsi
Chads2Vasc
Hfrs
Distinct Condition Count Long Term
Distinct Ingredient Count Long Term
Distinct Procedure Count Long Term
Distinct Measurement Count Long Term
Distinct Observation Count Long Term
Visit Count Long Term
Visit Concept Count Long Term

How to solve this problem?

Thank you.

Could you repeat this with bigger target cohort? It seems like you used 10-fold cross validation, which means you divided the training cohort into 10.
Or can you repeat this with decreased number for cross validation?

I repeat the analysis with less than 10 - fold cross validation but KNN, naive bayes, neural network, DNN model did not work.
I should get bigger cohort.

Thank you for your help.

Hello!
We have a similar issue trying to implement a Patient level Prediction: only LASSO and Gradient Boosting Machine algorithms reach values while the others algorithms result in empty models, just like the CHGALE’s table shows here. In our case the target cohort is even smaller than what is presented here (~400) but we don’t have access to more data for this problem in that database.

On the other hand, I can’t seem to find any sort of warning or error in the logs, the analysis seems to run and finish properly, except there is no result.
So my question is: are there requirements on certain factors (e.g. cohorts size, number of datapoints in the lookback period…) that need to be met for some of these methods? And if so shouldn’t the logs mention that the requirement(s) wasn’t met?

I ask this as a new user of ATLAS and its features, so I’m not sure if the absence of results for some but not all algorithms in my analysis is due to wrong definition of the analysis problem and parameters in ATLAS, or to an issue inherent to the data (like cohort size suggested here), or if it is an actual result that there was no significant covariate found (I’ve seen for instance in another analysis something like “No non-zero coefficient found” in the log of a LASSO analysis).

Thank you for the help!

Hello again, just wondering if anyone has insight on this topic? Is it normal that for these algorithms that end up without result the plplog.txt finishes with the line “Training XXX model” ?
Sorry for the inconvenience, and thanks for the help!

t