我正在尝试将randomForest与通过paste()函数构造的公式一起使用。但是,randomRorest拒绝接受这样的公式,而rpart则这样做。有谁知道我怎么能让它发挥作用?
library(rpart)
library(randomForest)
# Construct a formula by pasting stuff together.
columnName <- "Species"
modelFormula <- paste(columnName, " ~ .")
print(modelFormula)
## [1] "Species ~ ."
# Call rpart() and randomForest() with the constructed model.
model <- rpart(modelFormula, data=iris)
model <- randomForest(modelFormula, data=iris)
## Error in if (n == 0) stop("data (x) has 0 rows") :
## argument is of length zero
# This works if I directly include the formula.
model <- randomForest(Species ~ ., data=iris)
答案 0 :(得分:5)
您需要将字符串强制转换为公式对象(使用as.formula()
)才能使用randomForest()
:
R> model <- randomForest(as.formula(modelFormula), data=iris)
R> model
Call:
randomForest(formula = as.formula(modelFormula), data = iris)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4.67%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 4 46 0.08
字符串和公式对象
之间有一点区别R> modelFormula
[1] "Species ~ ."
R> as.formula(modelFormula)
Species ~ .
这很重要,因为如果你提供一个公式对象作为第一个参数,就会有一个formula
方法。如果不这样做,则会得到default
方法,并且不知道如何处理其参数x
的字符串。您可以在下面看到方法调度:
R> methods(randomForest)
[1] randomForest.default* randomForest.formula*
Non-visible functions are asterisked
R> debugonce(randomForest:::randomForest.formula)
R> model <- randomForest(modelFormula, data=iris) ## 1
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero
R> model <- randomForest(as.formula(modelFormula), data=iris)
debugging in: randomForest.formula(as.formula(modelFormula), data = iris)
debug: {
.... truncated
我调试了formula
方法,但在将公式对象作为第一个参数传递之前,它不会被调用。因此第一次调用中的错误(上面的## 1
)。使用公式对象,我们看到在调试器中调用了randomForest.formula
方法。
答案 1 :(得分:1)
执行:
model <- randomForest(as.formula(modelFormula), data=iris)
结果:
> model
Call:
randomForest(formula = as.formula(modelFormula), data = iris)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06