New R package: Phea: Phenotyping with formulas and without SQL joins

fabkury · November 25, 2022, 5:49pm

Dear community, I am happy to share the R package I created for electronic phenotyping, called Phea. The name losely stands for “phenotyping algebra.” It provides a framework for phenotyping that:

Is based on formulas, for example bmi = weight / (height * height), or platelet_drop = platelet_current - platelet_previous
does not use SQL joins to combine the data elements needed to calculate the formula.

Phea is on GitHub at https://github.com/fabkury/phea/:
devtools::install_github('fabkury/phea')
library(phea)

Under the hood, Phea is just a SQL query builder. All computation is done inside the SQL server, or you can ask to get just the final SQL code of the query. For those in the know, Phea leverages the dbplyr/dplyr lazy table infrastructure in the R language, as well as inherits the principles of tidy data that underpin them.

Phea shines for efficiently computing clinical scores such as a disseminated intravascular coagulation score, or the ASCVD Risk Estimator+ (to name just two), for any number of patients at all points in time. The results for one patient look like this:

Other use cases are also possible, and in fact, the formulas are allowed to contain any SQL expression that is valid inside a SELECT statement. For example, CASE WHEN ... constructs, or any function call supported by the server. Phea was built with OMOP CDM in mind, but is not restricted to it.

For a very brief look at how Phea can be used, please see the vignette computing body mass index.
For an explanation of the intuition behind Phea, and a look at its features, see getting started with Phea.

While Phea is a standalone tool, if you need to use its results inside Atlas, one approach is to use it to produce novel records, then ETL those back into the dataset using custom concept IDs.

Besides that, there are a few other perspectives on Phea that I look forward to presenting to its potential users. I will be happy to post here in the forum, later on, about:

Using the result of prior formulas inside other formulas.
Stress-testing Phea with formulas containing hundreds of components, each coming from a different SQL query; as well as hundreds of consecutive layers of formula->result->another formula.
Computing formulas with multiple different records from the same SQL query, e.g. a patient’s second instance of a diagnosis, or the hemoglobin A1c level from 60 days ago alongside its value today.
Computing formulas with Boolean components, as opposed to numeric ones.
Phenotyping entities other than patients, such as providers or care sites.

I am more than happy to receive any comments, here or via email.

Kindest regards, and happy Black Friday for those here in the US.

jpegilbert · November 29, 2022, 4:10pm

This looks like an interesting package. You may be interested to know that DatabaseConnector 6.0.0 intends to support DBI connections, so it could provide better integration with other OHDSI packages if you look in to this. Martijn has created a testing branch for this.

fabkury · November 29, 2022, 4:39pm

Thanks, Jamie. If you use DatabaseConnector, it’s possible to use DatabaseConnector::dbConnect() and provide that connection to Phea.

dbcon <- DatabaseConnector::dbConnect(RPostgres::Postgres(),
  host = 'localhost', port = 7654, dbname = 'fort',
  user = cred$pg$user, password = cred$pg$pass)

setup_phea(dbcon, 'cdm_new_york3')

fabkury · December 2, 2022, 6:38pm

This post is part of a series showcasing my new R package Phea for electronic phenotyping.

How complex can a formula be in Phea? And how does query time increase, as you increase the number of components (variables in the formula)?

Check out the vignette Stress-testing Phea, test A to see the framework being strained with 1 to 150 components, each coming from a different SQL query:

Phea exhibits sub-exponential growth in query time, as you increase the number of components or number of input rows.

This is in contrast to traditional (JOIN-based) phenotyping approaches. If you need to do one SQL JOIN to bring in each component of your phenotype, you could be looking at exponential growth.

Phea is just a SQL query builder. You can build a query and take it elsewhere, if wanted.

Kind regards!

fabkury · January 24, 2023, 7:30pm

This Friday (Jan. 27th) at the Phenotype Development Workgroup meeting (9 am) I will present Phea and teach how to use it. Come check it out

Link to the Teams meeting on Jan. 27th at 9 am:

fabkury · January 27, 2023, 3:50pm

Here is a recording of my presentation to the Phenotype Development workgroup on Jan. 27th: https://youtu.be/10GFtQREC0A

And here are the slides and complete transcription of everything I said during the presentation: Phea OHDSI Jan-27 - slides and notes.pdf (3.4 MB)

Some slides may look a bit weird, because you can’t see the PowerPoint animations.

I am always available to talk about Phea with anyone. Just message me here, or use my email fab at kury.dev.