我正在使用插入符合gbm模型。当我拨打trainedGBM$finalModel$fit
时,我会看到正确的输出。
但是当我调用predict(trainedGBM$finalModel, origData, type="response")
时,我会得到非常不同的结果,即使附加了origData,predict(trainedGBM$finalModel, type="response")
也会产生不同的结果。根据我的思维方式,这些调用应该产生相同的输出。有人可以帮我识别问题吗?
library(caret)
library(gbm)
attach(origData)
gbmGrid <- expand.grid(.n.trees = c(2000),
.interaction.depth = c(14:20),
.shrinkage = c(0.005))
trainedGBM <- train(y ~ ., method = "gbm", distribution = "gaussian",
data = origData, tuneGrid = gbmGrid,
trControl = trainControl(method = "repeatedcv", number = 10,
repeats = 3, verboseIter = FALSE,
returnResamp = "all"))
ntrees <- gbm.perf(trainedGBM$finalModel, method="OOB")
data.frame(y,
finalModelFit = trainedGBM$finalModel$fit,
predictDataSpec = predict(trainedGBM$finalModel, origData, type="response", n.trees=ntrees),
predictNoDataSpec = predict(trainedGBM$finalModel, type="response", n.trees=ntrees))
以上代码产生以下部分结果:
y finalModelFit predictDataSpec predictNoDataSpec
9000 6138.8920 2387.182 2645.993
5000 3850.8817 2767.990 2467.157
3000 3533.1183 2753.551 2044.578
2500 1362.9802 2672.484 1972.361
1500 5080.2112 2449.185 2000.568
750 2284.8188 2728.829 2063.829
1500 2672.0146 2359.566 2344.451
5000 3340.5828 2435.137 2093.939
0 1303.9898 2377.770 2041.871
500 879.9798 2691.886 2034.307
3000 2928.4573 2327.627 1908.876
答案 0 :(得分:7)
根据您的gbmGrid
,只有您的互动深度会在14到20之间变化,树木的缩小和数量分别固定为0.005和2000。设计TrainedGBM
只能找到目前最佳的互动水平。从ntrees
计算的gbm.perf
然后询问,假设最佳交互级别介于14到20之间,那么基于OOB标准的树的最佳数量是多少。由于预测取决于模型中树的数量,因此基于受过训练的GBM的预测将使用ntrees = 2000
,基于gbm.perf
的预测将使用从该函数估计的最佳ntrees
数。这将解释您的trainedGBM$finalModel$fit
和predict(trainedGBM$finalModel, type="response", n.trees=ntrees)
之间的差异。
使用gbm作为分类而不是回归模型显示基于虹膜数据集的示例
library(caret)
library(gbm)
set.seed(42)
gbmGrid <- expand.grid(.n.trees = 100,
.interaction.depth = 1:4,
.shrinkage = 0.05)
trainedGBM <- train(Species ~ ., method = "gbm", distribution='multinomial',
data = iris, tuneGrid = gbmGrid,
trControl = trainControl(method = "repeatedcv", number = 10,
repeats = 3, verboseIter = FALSE,
returnResamp = "all"))
print(trainedGBM)
给
# Resampling results across tuning parameters:
# interaction.depth Accuracy Kappa Accuracy SD Kappa SD
# 1 0.947 0.92 0.0407 0.061
# 2 0.947 0.92 0.0407 0.061
# 3 0.944 0.917 0.0432 0.0648
# 4 0.944 0.917 0.0395 0.0592
# Tuning parameter 'n.trees' was held constant at a value of 100
# Tuning parameter 'shrinkage' was held constant at a value of 0.05
# Accuracy was used to select the optimal model using the largest value.
# The final values used for the model were interaction.depth = 1, n.trees = 100
# and shrinkage = 0.05.
根据最佳交互深度找到最佳树木数量:
ntrees <- gbm.perf(trainedGBM$finalModel, method="OOB")
# Giving ntrees = 50
如果我们通过改变树木的数量和交互深度来训练模型:
gbmGrid2 <- expand.grid(.n.trees = 1:100,
.interaction.depth = 1:4,
.shrinkage = 0.05)
trainedGBM2 <- train(Species ~ ., method = "gbm",
data = iris, tuneGrid = gbmGrid2,
trControl = trainControl(method = "repeatedcv", number = 10,
repeats = 3, verboseIter = FALSE,
returnResamp = "all"))
print(trainedGBM2)
# Tuning parameter 'shrinkage' was held constant at a value of 0.05
# Accuracy was used to select the optimal model using the largest value.
# The final values used for the model were interaction.depth = 2, n.trees = 39
# and shrinkage = 0.05.
请注意,当我们改变树木数量和交互深度时,树木的最佳数量与gbm.perf
的计算结果非常接近。