createPS() on the local dataset

katewu · June 15, 2024, 7:18pm

instead of using getDbCohortMethoddate to generate cohortMethodData, how do I load my own dataset and to apply createPS() on my dataset?

Thanks!

schuemie · June 20, 2024, 4:58am

That is actually very hard, because CohortMethodData is a very specific type of data object.

Could you help me. understand you specific use case? What prevents you from using getDbCohortMethodData()`?

katewu · June 26, 2024, 4:46pm

Thanks for replying, @schuemie . My exposure is not a single drug. The types of exposure I am looking at consist of thousands of ConceptIDs, and I am trying to assign one person to one exposure category exclusively.

This is a hypothetical example. Imagine I want to study the effect of three drug categories—antibiotics, antineoplastic drugs, and antidiabetic drugs—on renal function. If a person receives all three categories of drugs during one visit, I would categorize them under antineoplastic drugs because I consider their effect to be more significant compared to antibiotics and antidiabetic drugs. Since my exposure groups are not straight forward. I wonder if I can input the same data format as it is from getDbCohortMethoddate(), so I can apply createPS().

It seems the columns from getDbCohortMethoddate() are: cohort_definition_id (including ids for the exposure groups and outcome), subject_id, cohort_start_date (exposure date), cohort_end_date

Thank you!

schuemie · June 27, 2024, 4:46am

CohortMethodData is an S4 class derived off the Andromeda class. It has 4 Andromeda tables (outcomes, cohorts, covariates, covariateRef), and a set of metadata attributes. As I said: it is non-trivial to construct yourself.

I would highly recommend creating your exposure cohorts in either ATLAS or Capr. These tools will be able to implement the logic you’re looking for. Once you have your exposure cohorts in place, you can run CohortMethod with just a few R calls, as described in the vignette.

Alternatively, if you do not wish to learn how to use ATLAS or Capr, you could create a cohort table on your server yourself. A cohort table has 4 columns: cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date. (The subject_id is the person_id).

katewu · June 27, 2024, 4:13pm

@schuemie Thanks for taking the time to reply.
Follow-up questions regarding cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date (where subject_id is the same as person_id):

Does cohort_definition_id include both 2 ids for 2 exposures and 1 id for 1 outcome?
For exposures, is cohort_start_date the same as exposure_start_date and cohort_end_date the same as exposure_end_date?
For outcomes, is cohort_start_date the same as outcome_start_date, and does cohort_end_date not matter for outcomes?
Thank you!

schuemie · July 1, 2024, 6:44am

Yes, I think all of those statements are correct.

When calling getDbCohortMethodData(), the targetId, comparatorId, and outcomeIds arguments will correspond to the cohort_definition_id field in the cohort table.

For exposures, we almost always indeed set cohort start and end to correspond to exposure start and end. Importantly, we typically combine subsequent prescriptions into a single exposure cohort entry (usually allowing for a gap between prescriptions). Note that, with the createStudyPopulation() function, you can set the time at risk based on the cohort start and end using the riskWindowStart, startAnchor, riskWindowEnd, and endAnchor arguments. By default, the time at risk start and end are identical to the cohort start and end.

katewu · August 22, 2024, 11:43am

@schuemie For plotKaplanMeier(), is there a function like summary(survfit(Surv()) that we can check the risk at each time point? Thank you!

schuemie · August 22, 2024, 1:57pm

No, sorry! You’ll have to hack the plotKaplanMeier() for that. E.g. copy the function source code and add some code of your own to export the data object.

katewu · September 4, 2024, 7:06pm

@schuemie another follow up question, can plotKaplanMeier() generate p value? thanks