Question

我有一个包含100个样本的数据集，每个样本都有195个突变，它们具有相应的已知临床意义（“ RealClass”）和根据某种预测工具的预测值（“ PredictionValues”）

为演示，这是一个随机数据集，其结构与我的数据集相同：

predictions_100_samples<-as.data.frame(matrix(nrow=19500,ncol=3))
colnames(predictions_100_samples)<-c("Sample","PredictionValues","RealClass")
predictions_100_samples$Sample<-rep(c(1:100), each = 195)
predictions_100_samples$PredictionValues<-sample(seq(0,1,length.out=19500))
predictions_100_samples$RealClass<-rep(c("pathogenic","benign"),each=10)
colours_for_ROC_curves<-rainbow(n=100)

我通过PROC软件包将全部100个样本绘制为ROC曲线：

library("pROC")
roc_both <- plot(roc(predictor=predictions_100_samples[1:195,2],response = predictions_100_samples[1:195,3]), col = colours_for_ROC_curves[1],main="100 samples ROC curves",legacy.axes=TRUE,lwd=1)
i=2
for(i in 1:100){
    set.seed(500)
    roc_both <- plot(roc(predictor=predictions_100_samples[(((i-1)*195)+1):(i*195),2],response = predictions_100_samples[(((i-1)*195)+1):(i*195),3]), col = colours_for_ROC_curves[i], add = TRUE,lwd=1)
                     i=i+1
}

这就是最终情节的样子：

现在，我想将所有100条绘制的ROC曲线的平均ROC曲线添加到同一图中。我尝试通过我编写的循环中的“ roc”函数使用针对每个阈值计算出的敏感性和特异性（roc_both$sensitivities，roc_both$specificities，roc_both$thresholds可以实现）

但是主要的问题是，沿着我绘制的100条ROC曲线，选择的阈值是随机的并且不相等，因此我无法手动计算平均ROC曲线。

是否有其他软件包可以让我生成多个ROC曲线的平均ROC曲线？还是有一个软件包可以设置手动计算灵敏度和特异性的阈值，以便以后可以计算平均ROC曲线？您可能对我的问题有其他解决方案吗？

谢谢！

Answer 1

您可以使用cutpointr通过oc_manual函数手动指定阈值。我对数据生成进行了一些更改，以使ROC曲线看起来更好一些。

我们对所有样品应用相同的阈值序列，并取每个阈值的灵敏度和特异性平均值，以得出“平均ROC曲线”。

predictions_100_samples <- data.frame(
    Sample = rep(c(1:100), times = 195),
    PredictionValues = c(rnorm(n = 9750), rnorm(n = 9750, mean = 1)),
    RealClass = c(rep("benign", times = 9750), rep("pathogenic", times = 9750))
)

library(cutpointr)
library(tidyverse)
mean_roc <- function(data, cutoffs = seq(from = -5, to = 5, by = 0.5)) {
    map_df(cutoffs, function(cp) {
        out <- cutpointr(data = data, x = PredictionValues, class = RealClass,
                         subgroup = Sample, method = oc_manual, cutpoint = cp,
                         pos_class = "pathogenic", direction = ">=")
        data.frame(cutoff = cp, 
                   sensitivity = mean(out$sensitivity),
                   specificity = mean(out$specificity))
    })
}

mr <- mean_roc(predictions_100_samples)
ggplot(mr, aes(x = 1 - specificity, y = sensitivity)) + 
    geom_step() + geom_point() +
    theme(aspect.ratio = 1)

您可以使用cutpointr通过以下方式绘制单独的ROC曲线和添加的平均ROC曲线：

cutpointr(data = predictions_100_samples, 
          x = PredictionValues, class = RealClass, subgroup = Sample,
          pos_class = "pathogenic", direction = ">=") %>% 
    plot_roc(display_cutpoint = F) + theme(legend.position="none") +
    geom_line(data = mr, mapping = aes(x = 1 - specificity, y = sensitivity), 
              color = "black")

或者，您可能希望研究汇总ROC曲线（SROC）的理论，以拟合结合了多个ROC曲线的参数模型。

绘制多个ROC曲线的平均ROC曲线R

1 个答案: