Question

我在R中使用rpart分类器。问题是 - 我想在测试数据上测试训练有素的分类器。这很好 - 我可以使用predict.rpart函数。

但我也想计算精确度，召回率和F1得分。

我的问题是 - 我是否必须为自己编写函数，或者R或任何CRAN库中是否有任何函数？

Answer 1

使用caret包：

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

适用于二进制和多类分类而不使用任何包的通用函数是：

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

关于该功能的一些评论：

假设F1 = NA为零
positive.class仅用于二进制f1
对于多类问题，计算宏观平均值
如果predicted和expected的级别不同，predicted将获得expected级别

Answer 2

ROCR库会计算所有这些以及更多内容（另请参阅http://rocr.bioinf.mpi-sb.mpg.de）：

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric(auc.tmp@y.values)

...

Answer 3

我注意到二进制类需要关于F1得分的评论。我怀疑它通常是。但不久前我写了这篇文章，其中我正在分类成几个用数字表示的组。这可能对你有用......

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

Answer 4

来自插入符号包的

confusionMatrix（）可以与适当的可选字段“Positive”一起使用，指定应将哪个因子作为积极因素。

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

此代码还将提供其他值，例如F统计量，准确度等。

Answer 5

confusionMatrix中的caret函数可以自动为您计算所有这些内容。

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

您也可以在上面的“ F1”中替换以下任意内容，以提取相关值：

“灵敏度”，“特异性”，“正定值”，“负定值”，“精度”，“召回”，“ F1”，“流行度”，“检测”，“比率”，“检测流行度” ”，“平衡精度”

我认为当您仅处理二进制分类问题时，其行为会略有不同，但是在两种情况下，当您查看$byClass

Answer 6

我们可以简单地从插入符号的confusionMatrix函数中获取F1值

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7]

Answer 7

您还可以使用confusionMatrix()包提供的caret。输出包括敏感度（也称为召回）和Pos Pred值（也称为精确度）。然后，如上所述，可以很容易地计算出F1： F1 <- (2 * precision * recall) / (precision + recall)

在R中计算精确度，召回率和F1得分的简便方法

7 个答案: