How can I get the following code (alternative code would be great too) for enhancing the speed of randomForest analyses on a regression equation using multiple cores in a parallel approach to work?
#Parallelized Random Forest Model
RFcores <- detectCores()/3 + 4
RFcores
RFtrees <- 1000/RFcores
RFtrees
cl <- makeCluster(RFcores)
registerDoParallel(cl)
timer <- proc.time()
form <- as.formula(paste(a, "~", b))
fit <- foreach(ntree = rep(RFtrees, RFcores), .combine = gtable_combine, .packages = 'randomForest') %dopar%
{
randomForest(form, data = maindf, mtry = 4,
keep.forest = FALSE, nodesize = 10000, do.trace = TRUE, maxnodes = 5,
improve = 0.01, doBest = TRUE, importance = TRUE, ntree = ntree)}
proc.time() - timer
stopCluster(cl)
}
I keep getting the following error related to the .combine argument
in the foreach
function.
error calling combine function:
<simpleError in align_2(x, y, along = along, join = join): Both gtables must have names along dimension to be aligned>
I look forward to any thoughts on this issue.