在对UCI资料库提供的印度肝病患者数据集进行分类后,尝试实施ROC曲线。得到一个错误。以下是R中的代码,后跟错误,后跟数据集头部的输入。
代码
library(ROCR)
library(ggplot2)
Data<-read.csv("C:/Users/Dell/Desktop/Codes and Datasets/ilpd.csv")
nrow(Data)
set.seed(9850)
gp<-runif(nrow(Data))
Data<-Data[order(gp),]
idx <- createDataPartition(y = Data$Class, p = 0.7, list = FALSE)
train<-Data[idx,]
test<-Data[-idx,]
ncol(train)
ncol(test)
#svm here
svmmodel<-svm(Class~.,train,
kernel="sigmoid")
prob<-predict(svmmodel,test,type="response")
#plot(prob)
pred <- prediction(prob, test$Class) #Getting error at this line
错误
Error in prediction(prob, test$Class) : Format of predictions is invalid.
数据集
structure(list(age = c(55L, 48L, 14L, 17L, 40L, 37L), gender = c(0L,
0L, 0L, 0L, 1L, 0L), TB = c(0.9, 2.4, 0.9, 0.9, 0.9, 0.7), DB = c(0.2,
1.1, 0.3, 0.2, 0.3, 0.2), Alkphos = c(116L, 554L, 310L, 224L,
293L, 235L), SGPT = c(36L, 141L, 21L, 36L, 232L, 96L), sgot = c(16L,
73L, 16L, 45L, 245L, 54L), TP = c(6.2, 7.5, 8.1, 6.9, 6.8, 9.5
), ALB = c(3.2, 3.6, 4.2, 4.2, 3.1, 4.9), AG = c(1, 0.9, 1, 1.55,
0.8, 1), Class = structure(c(2L, 1L, 2L, 1L, 1L, 1L), .Label = c("One",
"Two"), class = "factor")), .Names = c("age", "gender", "TB",
"DB", "Alkphos", "SGPT", "sgot", "TP", "ALB", "AG", "Class"), row.names = c(216L,
405L, 316L, 103L, 20L, 268L), class = "data.frame")
答案 0 :(得分:2)
当prob
正在寻找数字向量时,问题是prediction()
是因素。
SVM自然地输出类预测,但您可以覆盖它并使用
#svm here, note the probability=TRUE
svmmodel<-svm(Class~.,train,
kernel="sigmoid", probability = TRUE)
## In the ?predict.svm you can see probability = TRUE is needed to output probability
## type="repsonse" and type = "prob" would do nothing.
pred.output <-predict(svmmodel,test,probability = TRUE)
## It outputs the probabilities as an attribute, so you need to go in an grab them
prob <- attr(pred.output, "probabilities")[,2]
pred <- prediction(prob, test$Class) #Now this works