为什么插入符号的“parRF”导致“rf”不存在调整和缺失值错误

时间:2015-03-13 16:14:28

标签: r random-forest r-caret hpc

我有一个整洁的数据集,没有缺失值,只有数字列。

数据集既大又包含敏感信息,所以不幸的是我无法在这里提供它的副本。

我使用caret createDataPartition将此数据划分为训练和测试集:

idx      <- createDataPartition(y = model_final$y, p = 0.6, list = FALSE )
training <- model_final[idx,]
testing  <- model_final[-idx,]
x        <- training[-ncol(training)]
y        <- training$y
x1       <- testing[-ncol(testing)]
y1       <- testing$y

row.names(training) <- NULL
row.names(testing)  <- NULL
row.names(x)        <- NULL
row.names(y)        <- NULL
row.names(x1)       <- NULL
row.names(y1)       <- NULL

我已定期通过randomForest对这些数据进行随机森林模型的拟合和重组:

  rf <- randomForest(x = x, y = y, mtry = ncol(x), ntree = 1000,
                     corr.bias = T, do.trace = T, nPerm = 3) 

我决定看看我是否可以使用train获得更好或更快的结果,并且以下模型运行良好,但花了大约2个小时:

rf_train <- train(y=y, x=x,
               method='rf', tuneLength = 3,
               trControl=trainControl(method='cv',number=10,
                                      classProbs = TRUE
               )

我需要采用HPC方法使这在逻辑上可行,所以我尝试了

require(doParallel)
registerDoParallel(cores = 8)
rf_train <- train(y=y, x=x,
               method='parRF', tuneGrid = data.frame(mtry = 3), na.action = na.omit,
               trControl=trainControl(method='cv',number=10,
                                      classProbs = TRUE, allowParallel = TRUE)
               )

但无论我使用的是tuneLength还是tuneGrid,都会导致关于缺失值和调整参数的奇怪错误:

Error in train.default(y = y, x = x, method = "parRF", tuneGrid = data.frame(mtry = 3),  : 
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(y = y, x = x, method = "parRF", tuneGrid = data.frame(mtry = 3),  :
  missing values found in aggregated results

我说这很奇怪,因为method = "rf"没有错误,因为我检查了三倍以确保没有遗漏值。

当完全省略调整选项时,我甚至会得到相同的错误。我还尝试打开和关闭na.action选项,并将"cv"更改为"repeatedcv"

我甚至在这个超简化版本中遇到了同样的错误:

rf_train <- train(y=y, x=x, method='parRF')

1 个答案:

答案 0 :(得分:2)

似乎是因为插入符号中的错误。请参阅答案:

parRF on caret not working for more than one core

刚刚处理同样的问题,手动在每个新群集上加载foreach似乎都有效。