在插入符号中,当算法从众多算法中进行优化时,您是否可以推导出用于训练模型的预测变量?
我已经将预处理委托给插入符号,因为我知道我无法分开数据。在我理解的随机森林中,预测变量是决策树每个分支的变化子集。
鉴于mtry是
每个树节点可用于拆分的变量数。
以及
的摘要Resampling results across tuning parameters:
mtry Accuracy Kappa Accuracy SD Kappa SD
2 0.9944614 0.9929903 0.0010947590 0.001386114
28 0.9979948 0.9974629 0.0009365892 0.001183031
55 0.9957888 0.9946703 0.0019214403 0.002432008
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 28.
我想知道哪些功能被淘汰,哪些功能有用(特别是两个产生99.4%准确度的功能
model <- train(classe ~ ., method="rf", data=trainPre,
prox=TRUE,allowParallel=TRUE)
> summary(result$model)
Length Class Mode
call 5 -none- call
type 1 -none- character
predicted 15699 factor numeric
err.rate 3000 -none- numeric
confusion 30 -none- numeric
votes 78495 matrix numeric
oob.times 15699 -none- numeric
classes 5 -none- character
importance 58 -none- numeric
importanceSD 0 -none- NULL
localImportance 0 -none- NULL
proximity 246458601 -none- numeric
ntree 1 -none- numeric
mtry 1 -none- numeric
forest 14 -none- list
y 15699 factor numeric
test 0 -none- NULL
inbag 0 -none- NULL
xNames 58 -none- character
problemType 1 -none- character
tuneValue 1 data.frame list
obsLevels 5 -none- character
> result3$model
这些预测变量是否在模型对象的某个地方被捕获?
答案 0 :(得分:1)
为此目的,有一个名为predictors
的类。
但是,有几点需要注意:
randomForest
中存在一个错误,导致无法使用公式方法。我在二月份向安迪提交了一个错误请求,所以我会给他发一个提醒。 mtry
将拆分例程随机暴露给所有非信息预测器,那么它们将在列表中。 一个例子:
> library(caret)
>
> set.seed(135)
> tr_dat <- twoClassSim(100)
>
> set.seed(417)
> mod <- train(x = tr_dat[, -ncol(tr_dat)], y = tr_dat$Class, method = "rf")
>
> predictors(mod)
[1] "TwoFactor1" "TwoFactor2" "Linear01" "Linear02" "Linear03" "Linear04"
[7] "Linear05" "Linear06" "Linear07" "Linear08" "Linear09" "Linear10"
[13] "Nonlinear1" "Nonlinear2" "Nonlinear3"
最高