我有一个用于多类分类问题的代码:
data$Class = as.factor(data$Class)
levels(data$Class) <- make.names(levels(factor(data$Class)))
trainIndex <- createDataPartition(data$Class, p = 0.6, list = FALSE, times=1)
trainingSet <- data[ trainIndex,]
testingSet <- data[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Class
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Class
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)
oneRM_pred <- predict(oneRM, testing_x)
oneRM_pred
eval_model(oneRM_pred, testing_y)
AUC_oneRM_pred <- auc(roc(oneRM_pred,testing_y))
cat ("AUC=", oneRM_pred)
# Recall-Precision curve
oneRM_prediction <- prediction(oneRM_pred, testing_y)
RP.perf <- performance(oneRM_prediction, "tpr", "fpr")
plot (RP.perf)
plot(roc(oneRM_pred,testing_y))
但是在此行之后,代码不起作用:
oneRM_prediction <-预测(oneRM_pred,testing_y)
我收到此错误:
预测错误(oneRM_pred,test_y):预测格式为 无效。
此外,我不知道如何轻松获得F1度量。
最后一个问题,在多类分类问题中计算AUC是否有意义?
答案 0 :(得分:0)
让我们从F1开始。
假设您正在使用虹膜数据集,首先,我们需要像加载一样加载所有内容,训练模型并执行预测。
library(datasets)
library(caret)
library(OneR)
library(pROC)
trainIndex <- createDataPartition(iris$Species, p = 0.6, list = FALSE, times=1)
trainingSet <- iris[ trainIndex,]
testingSet <- iris[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Species
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Species
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)
然后,您应该计算每个类别的精度,召回率和F1。
cm <- as.matrix(confusionMatrix(oneRM_pred, testing_y))
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
diag = diag(cm) # number of correctly classified instances per class
precision = diag / colsums
recall = diag / rowsums
f1 = 2 * precision * recall / (precision + recall)
print(" ************ Confusion Matrix ************")
print(cm)
print(" ************ Diag ************")
print(diag)
print(" ************ Precision/Recall/F1 ************")
print(data.frame(precision, recall, f1))
之后,您可以找到宏F1。
macroPrecision = mean(precision)
macroRecall = mean(recall)
macroF1 = mean(f1)
print(" ************ Macro Precision/Recall/F1 ************")
print(data.frame(macroPrecision, macroRecall, macroF1))
要找到ROC(精确地是AUC),最好使用pROC
库。
print(" ************ AUC ************")
roc.multi <- multiclass.roc(testing_y, as.numeric(oneRM_pred))
print(auc(roc.multi))
希望对您有帮助。
在F1和AUC的link上找到详细信息。
答案 1 :(得分:0)
如果我以这种方式使用 levels(oneRM_pred)<-levels(testing_y):
...
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)
oneRM_pred <- predict(oneRM, testing_x)
levels(oneRM_pred) <- levels(testing_y)
...
精度比以前大大降低。因此,我不确定是否要强制执行相同的级别是一个好的解决方案。