我最初有一个由N行12列组成的数据框。最后一列是我的班级(0或1)。我必须使用
将整个数据框转换为数字training <- sapply(training.temp,as.numeric)
但后来我认为我需要将class列作为因子列使用randomforest()工具作为分类器,所以我做了
training[,"Class"] <- factor(training[,ncol(training)])
我继续使用
创建树training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do.trace = 100)
但是我遇到两个错误:
1: In Ops.factor(training[, "Status"], factor(training[, ncol(training)])) :
<= this is not relevant for factors (roughly translated)
2: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
如果有人能指出我正在制作的格式错误,我将不胜感激。
谢谢!
答案 0 :(得分:8)
所以问题其实很简单。事实证明我的训练数据是原子矢量。所以首先必须将其转换为数据框。所以我需要添加以下行:
training <- as.data.frame(training)
问题解决了!
答案 1 :(得分:6)
首先,由于语法错误,您对某个因素的强制不起作用。其次,在指定RF模型时应始终使用索引。以下是代码中应该使其工作的更改。
training <- sapply(training.temp,as.numeric)
training[,"Class"] <- as.factor(training[,"Class"])
training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=training[,"Class"],
importance=TRUE, do.trace=100)
# You can also coerce to a factor directly in the model statement
training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=as.factor(training[,"Class"]),
importance=TRUE, do.trace=100)