Question

我正在尝试创建一个给定~10万输入的随机森林。为了实现这些目标，我使用了train的插入包中的method = "parRF"。不幸的是，我的128 GB内存机器仍然耗尽。因此，我需要减少使用的内存量。

目前，我正在运行的培训方法是：

> trControl <- trainControl(method = "LGOCV", p = 0.9, savePredictions = T)
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
                     trControl = trControl)

但是，由于保留了每个林，系统会很快耗尽内存。如果我对train和randomForest的理解是正确的，那么每个随机森林至少会存储约500 * 100,000个双倍内容。因此，我想扔掉我不再需要的随机森林。我尝试使用

将keep.forest = FALSE传递给randomForest

> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
                       trControl = trControl, keep.forest = FALSE)
Error in train.default(x = data_preds, y = data_resp, method = "parRF",  : 
  final tuning parameters could not be determined

此外，这个警告反复抛出：

In eval(expr, envir, enclos) :
  predictions failed for Resample01: mtry=2 Error in predict.randomForest(modelFit, newdata) : 
  No forest component in the object

似乎由于某种原因，插入符号需要保留森林以便比较模型。有什么方法可以用更少的记忆来使用插入符号吗？

Answer 1

请注意，如果您使用M核心，则需要存储数据集M+1次。尝试少用工人。

R - 减少使用插入符号训练随机森林的内存使用量

1 个答案: