我正在使用doPar软件包试图将机器学习算法的训练并行化,因为它们似乎需要很长时间。
我的计划是训练多个神经网络,SVM和决策树(目前每个10个,名为neuralnet1 .. neuralnet10,svm1 ...,svm10等。数据帧all_classifiers包含我想要命名的分类器名称和停止/开始训练时间
> head(all_classifiers,3)
classifiers train_start train_stop
1 neuralnet1 7833 8074
2 neuralnet2 45590 45682
3 neuralnet3 64341 64574
> tail(all_classifiers,3)
classifiers train_start train_stop
28 dt8 235639 235737
29 dt9 256497 257198
30 dt10 257814 258034
我的循环现在看起来像这样
for(i in 1:trainloop{
# Select training data + remove NA
train_start <- all_classifiers[["train_start"]][i]
train_stop <- all_classifiers[["train_stop"]][i]
train_data <- na.omit(data[train_start:train_stop,])
print(paste("Using data from ", train_start,"to", train_stop))
train_scaled <- as.data.frame(train_data)
# Train appropriate model
firstLetter <- strtrim(all_classifiers[["classifiers"]][i],1)
if(firstLetter == "n"){
print("Training neural net")
trained_classifier <- neuralnet(f, data=train_scaled , hidden=c(3),
act.fct = 'logistic', linear.output=F,
stepmax=1e6, rep=1, learningrate = 0.30)
} else if(firstLetter == "s"){
print("Training SVM")
trained_classifier <- svm(upmove ~ . , data = train_scaled,
kernel = "polynomial", coef0 = 2.0)
} else if(firstLetter == "d"){
print("Training DT")
train_scaled$upmove <- as.factor(train_scaled$upmove)
trained_classifier <- C5.0(f, data = train_scaled)
}
flog.info(paste("Training",all_classifiers[["classifiers"]][i]))
assign(toString(all_classifiers[["classifiers"]][i]), trained_classifier)
}
我希望使用
来并行化这个循环foreach(i = 1:trainloop, .packages = 'neuralnet',
'e1071','C5.0','futile.logger') %dopar% { %loop here$ }
但似乎每个工作者都以迭代器i = 1开始,而我的变量赋值
assign(toString(all_classifiers[["classifiers"]][i]), trained_classifier)
取决于所使用的迭代器的值。我该如何解决这个问题?最后,我希望在all_classifiers的第一列中的所有名称都是关联的开始和停止训练时间的训练分类器。
答案 0 :(得分:0)
我不想深入了解您的代码细节,但这里有一个小例子,希望能帮助您了解如何将基本R循环转换为foreach:
x1 <- numeric(10)
for (i in 1:10) {
x1[i] <- i^2
}
x2 <- foreach(i=1:10,.combine=rbind) %do% {
i^2
}
x1==x2