Phenotype Phebruary 2023 P8 Parkinson's disease

fabkury · February 10, 2023, 4:15pm

Here is a summary of the “Phea approach” for the tiered consensus algorithm for Parkinson’s Disease (adapted Szumski and Cheng 2009, see @allanwu’s message above).

What is the tiered PD phenotype, and what’s the matter with it?

The tiered consensus algorithm phenotype requires PD to be coded more frequently than competing diagnoses, within the past 3 years. The competing diagnoses are non-PD parkinsonism and secondary parkinsonism.

The problem is that Atlas can’t do such a “most frequent diagnosis among 2 groups in the past 3 years” calculation. In inclusion 4 above, @allanwu tried to approach that logic in Atlas’s Cohort Definitions tool by scanning the prior 3 years one year at a time.

Goal

Create a phenotype that correctly applies the tiered logic (“most frequent among 2 groups” logic). Do that by injecting Phea-generated SQL into an OHDSI-compatible cohort definition (cohortDefinitionSet).

Whether the visit is a neurology visit or not, was not considered. This aspect of the logic was ignored because we thought it would be too difficult and unreliable to map “neurology visit” across sites in a network study.

Logic to be computed

Tiered diagnosis criteria: Parkinson’s Disease is more frequent than competing diagnoses (non-PD and secondary parkinsonism).

Query logic:

At every visit occurrence, look back and count the number of occurrences of the two groups of conditions, PD and non-PD.
Eliminate the visits where the non-PD count is bigger than the PD count.
Patient meets the criteria if it has at least one of those “special” visits that remain.

Alternative logic (not used): Use most recent diagnosis, instead of most popular.

The hard part isn’t producing a SQL query that captures the above logic (and you have Phea to write that query for you). The hard part is correctly plugging in that “special SQL” code into the OHDSI’s ecosystem, i.e. into an OHDSI study package.

Challenges

Phea SQL needs to be compatible with the local SQL flavor. (not yet addressed, Postgres was used)
Possible solution A: have the study package generate Phea SQL locally.
Phea SQL needs to be compatible with CohortGenerator / OHDSI-SQL.
Possible solution A: replace a placeholder criterion with Phea SQL. (this was the approach taken)
Possible solution B: have Phea insert rows into @target_database_schema.@target_cohort_table directly, without going through CohortGenerator.
I can’t test the code post-Phea, because neither Synthea nor Eunomia have any Parkinson’s disease diagnosis code.
Potential solution: For testing purposes, surrogate conditions could be used. (this was not done)

How the “SQL replacement approach” works

Manually copy the cohort definition created by the group (https://data.ohdsi.org/PhenotypePhebruary2023_P8_ParkinsonsDisease/) into a new one (ATLAS).
Manually add an extra criterion in that copy: has visit where PD Dx is the most frequent.

(subsequent steps are done by R code, see TXT file attached)

Download the cohort definition using ROhdsiWebApi::exportCohortDefinitionSet().
Read the SQL file that was downloaded. Replace the extra criterion with the Phea SQL.
a. Before this substitution, adapt Phea’s code to be compatible with OHDSI-SQL:
i. Replace schema references with OHDSI aliases (e.g. @cdm_database_schema).
ii. Retrieve code sets from the temporary table #Codesets.
Generate a new study package using the modified SQL file. (did I do this correctly? I am not sure)

The modified study package is in MS Teams:

Modified cohort definition: cohort 1781786 Phea.zip
Modified study package: PhenotypeParkinsonsDisease-phea.zip

Comments and takeaways

Local generation of Phea SQL (at data holder’s computer) is complicated by:
a. Phea needs to read the column names to build the SQL query. But Phea doesn’t know the details about the local CDM like SqlRender does. Therefore, OHDSI-SQL aliases need to be resolved into real table names and provided to Phea.
Probably can’t edit the cohort definition after the SQL replacement. It is likely that you will lose Phea’s SQL code when the cohort generation SQL gets re-generated.
Alternative approach: Use Phea to directly insert rows into @target_database_schema.@target_cohort_table, with or without going through CohortGenerator.
Long term best solution: Develop Phea all the way into a HADES-integrated R package.

Files in MS Teams’ Phenotyping Development Workgroup team

Edited cohort definition: cohort 1781786 Phea.zip
Edited study package: PhenotypeParkinsonsDisease-phea.zip

R code that generates the Phea SQL and performs the “SQL replacement” pipeline

parkinsons.txt (8.2 KB)

Final words

If anyone trusts me enough to download PhenotypeParkinsonsDisease-phea.zip from MS Teams and run the study on your local CDM instance, that would be nice! Take a look at file 310.sql if you want to see the edits that were applied (search for keyword “phea” in the file). But I admit the chance that it will work is very small. That is because, while I could test the query logic per se in my local Postgres server, I don’t have a fully-featured local OHDSI environment for running the study package. Moreover, the code was generated for PostgreSQL – maybe it happens to work in other SQL flavors as well, maybe not.