在插入符号的训练函数中并行调整xgboost模型时出错

时间:2018-08-24 20:45:06

标签: r r-caret xgboost doparallel

我试图在插入符中为xgboost模型运行一些交叉验证调整。我有一个很大的调整网格,所以我想并行运行它。我将数据设置为稀疏矩阵,设置调整网格,进行并行处理,然后尝试运行train,但是每次都会收到连接错误。如果禁用并行选项,它将运行良好。这不是我的数据,因为此样本数据和我的实际数据都存在相同的问题。是什么原因造成的?我也很好奇为什么xgb.Dmatrix train已经有指向它的指针时,为什么需要在dtrain函数中定义y标签。

示例数据在DALEX包中,即公寓数据集。

library(caret)
library(xgboost)
library(Matrix)
library(DALEX) # get access to the sample data called "apartments"
library(doParallel)

x.train <- sparse.model.matrix(m2.price ~. -1 , data = apartments)
dtrain <- xgb.DMatrix(x.train, label = apartments$m2.price)

grid = expand.grid(
  nrounds = 500,
  eta = seq(.002,0.004,by = .002),
  max_depth = seq(2, 4, by = 2),
  gamma = 0, 
  colsample_bytree = 1,
  min_child_weight = seq(8, 10, by = 2),
  subsample = 0.5
)

# set cross validation
fitControl = trainControl(
  method = "cv",
  number = 5
)

# set up parallel processing
cl <- makeCluster(detectCores())
registerDoParallel(cl)
getDoParWorkers()


Tune = train(x = dtrain, y = apartments$m2.price,
             trControl = fitControl,
             tuneGrid = grid,
             method = "xgbTree",
             na.action = na.pass
)

Error in serialize(data, node$con) : error writing to connection

0 个答案:

没有答案