Confidence intervals for Precision-Recall curve

Julwa · August 26, 2021, 11:58am

Hi,
We are developing a set of Patient-Level-Prediction models.
During this process it was suggested to describe the CI for the AUCPR.
I have not been able to identify a good way to do this and was wondering if someone could help?

I checked out the source code for both plotPrecisionRecall() and the evaluatePLP() (pr.curve), But i was not able to find a good way to add the compution of the CI.

Thank you

Best Julie

@cssdenmark

egillax · August 27, 2021, 1:06pm

It seems to me the AUPRC is calculated in line 77 in evaluatePlp:

pr <- PRROC::pr.curve(scores.class0 = positive, scores.class1 = negative)

I don’t think the PRROC package supports computing CIs. Did you have a method in mind to do that? If not this chapter might help:

https://link.springer.com/chapter/10.1007/978-3-642-40994-3_29

They suggest using either binomial or logit intervals for the CIs and show with simulations it has sufficient coverage. Bootstrapping also works but it approaches sufficient coverage from below with increasing sample size and is also more computationally expensive.

They have R code with those methods on github:

github.com

kboyd/raucpr/blob/master/precision_recall.r

### Copyright (c) 2013 Kendrick Boyd. This is free software. See
### LICENSE for details.

### Precision-Recall Analysis Stuff for R
### 


### Precision-Recall curves
###
### positive (cases) scores (outputs, probabilities, etc.) are random
### variable Y negative (controls) scores are random variable X
###
### pi (prevalence) =  # positives / (# positives + # negatives)


######################################################################
### Binormal
### Assume scores are normally distributed.

### parameters:

This file has been truncated. show original

For example the function in line 257 uses the logit method. You only need the auprc estimate which is already calculated in evaluatePLP (auprc) and the number of positive samples and your alpha. The number of positive samples is probably already there as well as length(positive) or something similar.

Hope this helps,
Egill