Question

我试图在R中实现一个简单的随机森林算法，只是为了知道R和随机森林如何工作，并测试测试集中的准确性。

我的样本数据（五行，总共561行）是：

bulbasaur[1:5,]
   Appt_date count no_of_reps PerReCount
1 2016-01-01     2          1   2.000000
2 2016-01-04   174         58   3.000000
3 2016-01-05   206         59   3.491525
4 2016-01-06   203         61   3.327869
5 2016-01-07   236         64   3.687500

我写的代码是：

install.packages("caret")
library(caret)

leaf <- bulbasaur
ctrl = trainControl(method="repeatedcv", number=100, repeats=50, selectionFunction = "oneSE")
in_train = createDataPartition(leaf$PerReCount, p=.75, list=FALSE)

#random forest
trf = train(PerReCount ~ ., data=leaf, method="rf", metric="RMSE",trControl=ctrl, subset = in_train)


#boosting
tgbm = train(PerReCount ~ ., data=leaf, method="gbm", metric="RMSE",
             trControl=ctrl, subset = in_train, verbose=FALSE)

resampls = resamples(list(RF = trf, GBM = tgbm))
difValues = diff(resampls)
summary(difValues)



######Using it on test matrix
test = leaf[-in_train,]
test$pred.leaf.rf = predict(trf, test, "raw")
confusionMatrix(test$pred.leaf.rf, test$PerReCount)

但是，我收到以下错误：

Error in confusionMatrix.default(test$pred.leaf.rf, test$PerReCount) : 
  the data cannot have more levels than the reference

我尝试了一些更改，例如点击leaf$PerReCount <- as.factors(leaf$PerReCount)，然后添加type = "class"，但是提到的准确性非常糟糕，而且我不想将其从回归更改为分类。如何解决它而不转换为因素，或以任何其他方式解决问题，或者在不使用混淆矩阵的情况下获得准确计数。感谢

Answer 1

@Damiano提出的问题是正确的，回归模型不会给出混淆矩阵，因为它不是或不是。我解决的问题是使用RMSE：

piko.chu = predict(trf, test)
RMSE.forest <- sqrt(mean((piko.chu-test$PerReCount)^2))

R - 随机森林 - 在测试数据上应用混淆矩阵时出错

1 个答案: