Phenotype Phebruary - Capr style!

Adam_Black · February 9, 2022, 3:54pm

I’ve started working on creating the Phenotype Phebruary cohorts using the Capr R package that was recently introduced into Hades by @mdlavallee92 and will post my cohort definitions here. If anyone else is interested in trying out Capr, wants to try creating the R code for a Phenotype Phebruary cohort or has feedback on the code please join the Capr discussion.

Here are the first two Type 2 diabetes cohorts.

library(DatabaseConnector)
connectionDetails <- createConnectionDetails("postgresql", user = "postgres", password = "", server = "localhost/covid")
connection <- connect(cd)
dbGetQuery(connection, "select * from cdm5.person limit 8")

vocabularyDatabaseSchema <- "cdm5"



##################################################################################################################################
##  [PhenotypePhebruary][T2DM] Persons with new type 2 diabetes mellitus at first diagnosis -------------------------------------

## Concept sets ------

nm0 <- "Type 2 diabetes mellitus (diabetes mellitus excluding T1DM and secondary)"
cid0 <- c(443238, 201820, 442793, 40484648, 20125, 435216, 201254, 195771, 4058243, 761051)
conceptMapping0 <- list(
  list(includeDescendants = TRUE, isExcluded = FALSE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = FALSE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = FALSE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE),
  list(includeDescendants = TRUE, isExcluded = TRUE, includeMapped = FALSE))

conceptSet0 <- getConceptIdDetails(conceptIds = cid0,
                                   connection = connection,
                                   vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                                   mapToStandard = FALSE) %>%
    createConceptSetExpressionCustom(Name = nm0, conceptMapping = conceptMapping0)

## Initial Event Cohort -----

# People having any of the following:
#   a condition occurrence of Type 2 diabetes mellitus (diabetes mellitus excluding T1DM and secondary)
#  with continuous observation of at least 0 days prior and 0 days after event index date, and limit initial events to: earliest event per person.

queryPC1 <- createConditionOccurrence(conceptSetExpression = conceptSet0)

PrimaryCriteria <- createPrimaryCriteria(Name = "cohortPrimaryCriteria",
                                         ComponentList = list(queryPC1),
                                         ObservationWindow = createObservationWindow(PriorDays = 0L,PostDays = 0L),
                                         Limit = "First")


AdditionalCriteria <- createAdditionalCriteria(Name = "cohortAdditionalCriteria", Contents = NULL, Limit = "First")


# Inclusion Rules -----

# Inclusion Criteria #(1): has 365d prior observation
# Having all of the following criteria:
#   at least 1 occurrences of: an observation period
# where event starts between All days Before and 365 days Before index start date and event ends between 0 days After and All days After index start date
# Limit qualifying cohort to: earliest event per person.


timelineInclusionRule1_1 <- createTimeline(
  StartWindow = createWindow(EventStarts = TRUE, StartDays = "All", StartCoeff = "Before", EndDays = 365L, EndCoeff = "Before"),
  EndWindow = createWindow(EventStarts = FALSE, StartDays = 0L, StartCoeff = "After", EndDays = "All", EndCoeff = "After"))

countInclusionRule1_1 <- createCount(Query = createObservationPeriod(), Logic = "at_least", Count = 1L, Timeline = timelineInclusionRule1_1)

InclusionRule1 <- createGroup(Name = "has 365d prior observation",
                              type = "ALL",
                              criteriaList = list(countInclusionRule1_1))

InclusionRules <- createInclusionRules(Name = "cohortInclusionRules",
    Contents = list(InclusionRule1), Limit = "First")

# Cohort Collapse Strategy -----
#   Collapse cohort by era with a gap size of 0 days
## Create the cohort definition -----

# No end date strategy selected. By default, the cohort end date will be the end of the observation period that contains the index event.

cohortT2DM_1 <- createCohortDefinition(Name = "[PhenotypePhebruary][T2DM] Persons with new type 2 diabetes mellitus at first diagnosis",
                                       cdmVersionRange = ">=5.0.0",
                                       PrimaryCriteria = PrimaryCriteria,
                                       AdditionalCriteria = AdditionalCriteria,
                                       InclusionRules = InclusionRules,
                                       EndStrategy = NULL,
                                       CensoringCriteria = NULL,
                                       CohortEra = createCohortEra(0L))




##################################################################################################################################
##  [PhenotypePhebruary][T2DM] Persons with new type 2 diabetes and no prior T1DM or secondary diabetes --------------------------

# Initial Event Cohort
# People having any of the following:
#   a condition occurrence of Type 2 diabetes mellitus (diabetes mellitus excluding T1DM and secondary)3
# with continuous observation of at least 0 days prior and 0 days after event index date, and limit initial events to: earliest event per person.
#
# Inclusion Rules
# Inclusion Criteria #(1): has 365d prior observation
# Having all of the following criteria:
#   at least 1 occurrences of: an observation period
# where event starts between All days Before and 365 days Before index start date and event ends between 0 days After and All days After index start date
# Inclusion Criteria #(2): no Type 1 diabetes mellitus diagnosis on or prior to T2DM
# Having all of the following criteria:
#   exactly 0 occurrences of: a condition occurrence of Type 1 diabetes mellitus2
# where event starts between All days Before and 0 days After index start date
# Inclusion Criteria #(3): no secondary diabetes diagnosis on or prior to T2DM
# Having all of the following criteria:
#   exactly 0 occurrences of: a condition occurrence of Secondary diabetes mellitus1
# where event starts between All days Before and 0 days After index start date
# Limit qualifying cohort to: earliest event per person.
# End Date Strategy
# No end date strategy selected. By default, the cohort end date will be the end of the observation period that contains the index event.
# Cohort Collapse Strategy:
#   Collapse cohort by era with a gap size of 0 days


# This definition is very similar to the previous one. We simply need to add two additional inclusion criteria.

# Inclusion Criteria #(2): no Type 1 diabetes mellitus diagnosis on or prior to T2DM
# Having all of the following criteria:
#   exactly 0 occurrences of: a condition occurrence of Type 1 diabetes mellitus2
# where event starts between All days Before and 0 days After index start date

conceptSet1 <- getConceptIdDetails(conceptIds = c(435216L, 201254L, 40484648L),
                                   connection = connection,
                                   vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                                   mapToStandard = FALSE) %>%
  createConceptSetExpressionCustom(Name = "Type 1 diabetes mellitus")

queryInclusionRule2_1 <- createConditionOccurrence(conceptSetExpression = conceptSet1)

timelineInclusionRule2_1 <- createTimeline(StartWindow = createWindow(EventStarts = TRUE, StartDays = "All", StartCoeff = "Before", EndDays = 0L, EndCoeff = "After"),
                                           EndWindow = NULL,
                                           IgnoreObservationPeriod = TRUE)

countInclusionRule2_1 <- createCount(Query = queryInclusionRule2_1, Logic = "exactly", Count = 0L, Timeline = timelineInclusionRule2_1)

InclusionRule2 <- createGroup(Name = "no Type 1 diabetes mellitus diagnosis on or prior to T2DM",
                              type = "ALL",
                              criteriaList = list(countInclusionRule2_1))


# Inclusion Criteria #(3): no secondary diabetes diagnosis on or prior to T2DM
# Having all of the following criteria:
#   exactly 0 occurrences of: a condition occurrence of Secondary diabetes mellitus1
# where event starts between All days Before and 0 days After index start date

conceptSet2 <- getConceptIdDetails(conceptIds = 195771L,
                                   connection = connection,
                                   vocabularyDatabaseSchema = vocabularyDatabaseSchema,
                                   mapToStandard = FALSE) %>%
  createConceptSetExpressionCustom(Name = "Secondary diabetes mellitus")


queryInclusionRule3_1 <- createConditionOccurrence(conceptSetExpression = conceptSet2, attributeList = NULL)

timelineInclusionRule3_1 <- createTimeline(StartWindow = createWindow(StartDays = "All", StartCoeff = "Before", EndDays = 0L, EndCoeff = "After", EventStarts = TRUE, IndexStart = TRUE),
                                           EndWindow = NULL,
                                           IgnoreObservationPeriod = TRUE)

countInclusionRule3_1 <- createCount(Query = queryInclusionRule3_1,
                                     Logic = "exactly",
                                     Count = 0L,
                                     isDistinct = FALSE,
                                     Timeline = timelineInclusionRule3_1)

InclusionRule3 <- createGroup(Name = "no secondary diabetes diagnosis on or prior to T2DM",
                              type = "ALL",
                              count = NULL,
                              criteriaList = list(countInclusionRule3_1),
                              demographicCriteriaList = NULL,
                              Groups = NULL)

InclusionRules2 <- createInclusionRules(Name = "cohortInclusionRules",
                                        Contents = list(InclusionRule1, InclusionRule2, InclusionRule3),
                                        Limit = "First")

cohortT2DM_2 <- createCohortDefinition(Name = "[PhenotypePhebruary][T2DM] Persons with new type 2 diabetes and no prior T1DM or secondary diabetes",
                                       cdmVersionRange = ">=5.0.0",
                                       PrimaryCriteria = PrimaryCriteria,
                                       AdditionalCriteria = AdditionalCriteria,
                                       InclusionRules = InclusionRules2,
                                       EndStrategy = NULL,
                                       CensoringCriteria = NULL,
                                       CohortEra = createCohortEra(0L))

aostropolets · February 9, 2022, 5:46pm

@Adam_Black sounds interesting! Would you mind describing in a couple of words the benefits of using Capr over creating a cohort in Atlas and running it as an sql statement?

Adam_Black · February 9, 2022, 6:11pm

Yes!

I’m going to steal from Hadley Wickham’s provocatively titled talk "You can’t do data science in a GUI”.

Why prefer code?

Code is a language and languages provide expressive ability
Code is text which enables copy/paste, searchability, diffability (easily identify differences), reproducibility, and readability

These benefits are already captured in a large part by the json format used in OHDSI which I would describe as a language of cohort definition. Capr provides a native R interface to that language that allows for the reuse of component parts of a cohort as demonstrated above. It also has the potential to be more readable and expressive than the json format. Some of the benefits of the code based interface were described in Clark Evans’ Query Combinators poster a few years ago.

This is version 1 so if you give it a try and have suggestions on usability, readability, etc please share them.

Adam_Black · February 10, 2022, 3:41am

Martin set up a repository for the Phenotype Phebruary Capr code. Here is day 1 https://github.com/mdlavallee92/CaprPhenotypePheburary/blob/main/results/day01/Type%202%20diabetes%20mellitus.R

Patrick_Ryan · February 10, 2022, 3:53am

Awesome, thank you @Adam_Black and @mdlavallee92 , I was hoping to showcase Capr and this is a wonderful way of demonstrating the power of a programmatic approach to phenotypes that are otherwise implemented in ATLAS. I see a valuable and complementary role to these tools, and am eager to see both used more broadly in our community.

mdlavallee92 · February 10, 2022, 9:38pm

Good to hear @Patrick_Ryan. Excited to put Capr through the test using Phenotype Pheburary as a test case. Anyone interested in contributing please checkout the PhenotypePheburaryCapr repository for details on how to contribute. Also check out the Capr repository for installation instructions and vignettes on how to get started. Any feedback on the software would be great! While I am more than happy to keep this repository under my github user, would it be possible to transfer this repository to the OHDSI github?

Update: Replacing link to PhenotypePhenuraryCapr repo now on ohdsi-studies!

Patrick_Ryan · February 10, 2022, 8:35pm

Tremendous stuff! @mdlavallee92 , I made you a repo out on ohdsi-studies, and made you Admin. Feel free to give whoever else you’d like write access: https://github.com/ohdsi-studies/PhenotypePhebruaryCapr

Thank you for this important contribution!

Gowtham_Rao · February 11, 2022, 2:06am

This is awesome! I have experimented with Capr and thank you @mdlavallee92 for helping me get started.

Capr has another big advantage - it is the only other way to generate the cohort specification json in OHDSI. Once we have that json - it can be converted to human readable text, or a SQL.

Another big advantage of Capr is its ability to support, programmatically, concept set expressions.