OHDSI Home | Forums | Wiki | Github

Adding a new cohort from Synthea generated csv files to an existing CDM


(Pallavi) #1

We have a CDM with target and outcome cohorts from a training dataset. How can we add a new cohort from another set of csv files to this CDM so that we can do prediction on the new cohort?


(Vojtech Huser) #2

your question is confusing. Imagine that CDM data is stored in a database. And cohorts are sets that you define on top of this data.

So to add new patients - you modify the CDM database.

You must clarify your question.


(Pallavi) #3

I made a CDM using csv files generated from Synthea (https://github.com/synthetichealth/synthea/wiki/Basic-Setup-and-Running) using the commands mentioned on OHDSI GitHub page (https://github.com/OHDSI/ETL-Synthea). I built a target and an outcome cohort on this CDM, and used PatientLevelPrediction package to obtain a PlpModel.

I aim to use this PlpModel to make predictions on another dataset. For this I tried the following:

I made another CDM using a different set of synthea csv files, on which I wanted to predict outcomes (predictPlp function) using the previously built PlpModel. I made one target cohort, defined covariates, obtained plpData, but got stuck with createStudyPopulation function. It does not run without an outcome cohort ID. While I am trying to make predictions, I only have a target cohort, not an outcome cohort.

I got stuck with the previous attempt and need help with this next one:

I would like to add the another set of synthea csv files to the preexisting CDM, make a cohort of that data, and predict outcomes in this cohort using the previously made model (using predictPlp function). How can I add this new cohort from csv files to an already existing CDM, keeping a clear distinction between old data and new data?


t