我正在用R中的决策树上的iris数据集做一个教程。这是我的基本教程的代码。
library(rpart)
install.packages('rpart.plot')
library(rpart.plot)
s = sample(150,100)
iris_train = iris[s,]
iris_test = iris[-s,]
dtm = rpart(Species~.,iris_train, method="class")
rpart.plot(dtm, type=4, extra=101)
p = predict(dtm,iris_test,type="class")
table(iris_test[,5],p)
表格行给了我:
setosa versicolor virginica
setosa 12 0 0
versicolor 0 18 0
virginica 0 3 17
如果我只对Virginica的预测感兴趣,我该怎么办?是否有可能合并其余的值,以便得到Virginica vs Versicolor + Setosa的二进制分类?
答案 0 :(得分:0)
你可以做你想做的事
library(rpart)
install.packages('rpart.plot')
library(rpart.plot)
s = sample(150,100)
class <- which(iris$Species %in% c("versicolor","setosa"))
####################################
new_species = rep("virginica",nrow(iris))
new_species[class] <- "vers_seto"
iris$new_species <- new_species
####################################
iris_train = iris[s,-5] # -5 Delete the old column Species (column number 5)
iris_test = iris[-s,-5]
dtm = rpart(new_species~.,iris_train, method="class")
rpart.plot(dtm, type=4, extra=101)
p = predict(dtm,iris_test,type="class")
table(iris_test[,5],p)