Question

我正在使用gbm来构建预测回归模型。我有训练和测试集（预定义和 NOT 随机选择）。以下是代码的概述。

我在列车数据中有大约600行，在测试数据中有150行。我知道他们很少但仍然。

train <- ....
test <- ....

set.seed(123)
model <- gbm(target ~., data = train,
                distribution = "gaussian",
                n.trees = 4000,
                interaction.depth = 2,
                n.minobsinnode = 5,
                shrinkage = 0.01,
                bag.fraction = 1,
                train.fraction = .95,
                verbose = TRUE
            )

best_iter <- gbm.perf(model)

set.seed(123)
predictions <- predict(model, newdata = test, n.trees = best_iter)

set.seed(123)
predictions <- predict(model, newdata = train, n.trees = best_iter)

不知何故，当我使用完全相同的参数一次又一次地运行gbm模型时，我 无法在测试集上重现 预测。但与此同时，我总能重现火车组的预测。我在建立模型和做出预测之前也在设置种子。有人可以帮我弄清楚发生了什么吗？请注意，列车和测试数据始终保持不变，我不会在每次运行中都改变它们。

Answer 1

你有没有弄明白这个问题？我正在做与你完全相同的模型方法，我能看到的代码中唯一的区别就是你的预测。您可以尝试从每个列车和测试的新数据中删除因变量。另外，直接设置n.trees，不确定你目前的方式是什么。并将预测保存到两个单独的对象中。

PredEst <- predict(model, newdata = train[-which(names(train) %in% as.character("target"))], n.trees = 4000)

PredVal <- predict(model, newdata = test[-which(names(test) %in% as.character("target"))], n.trees = 4000)

无法使用gbm重现测试预测

1 个答案: