Question

我对R中的并行处理比较陌生。当我偶然发现一个小问题时，一直在玩一些代码 - foreach循环中的代码似乎并没有操纵某些变量/数据帧，而实际上做了一些预测（程序的目标）。我的代码如下 -

library(parallel)
library(doParallel)
library(foreach)
library(iterators)

# select people who visited and who did not visit separately
vis <- b[which(b$visit==1),]
no <- not <- b[which(b$visit!=1),]

# create parallel processing environment for 2 (TWO) processors
cl<-makeCluster(2)
registerDoParallel(cl)

iterations <- 3
#k=f=1
predictions <- foreach(icount(iterations), .combine=cbind) %dopar% {

   # randomly select training & testing set from Visited customers
   pos <- sample(nrow(vis),size=floor(nrow(vis)/10*7),replace=FALSE)
   train<- vis[pos,]
   test<- vis[-pos,]

   # create distinct and non-repeatable bags for Non-visited customers
   sel <- sample(nrow(not), size=9246, replace=FALSE)
   #train1 <-1:nrow(not) %in% sel
   no <- not[sel,]
   not <- not[-sel,]

   # randomly select training & testing set from Non-Visited customers
   pos1 <- sample(nrow(no),size=floor((nrow(no)/10)*7),replace=FALSE)
   trainNo <- no[pos1,]
   testNo <- no[-pos1,]

   # combine the train & test Bags of both Visit & Non-Visit customers
   trainSet <- rbind(train,trainNo)
   testSet <- rbind(test,testNo)

   fit <- glm(visit~., data=trainSet, family=binomial(logit))

   #pr <- 
   print(length(not))
   predict(fit,testSet[,-10])
   #pr <- rbind(pr,predict(fit,testSet[,-10]))
 }
pred <- rowMeans(predictions)
stopCluster(cl)

我面临的问题是：

＆＃39; not＆＃39;即使在foreach循环之后，数据框仍然保持相同的大小（它需要随着所选记录的每次迭代而减少）。
在foreach循环中创建的所有变量在运行之后似乎都不存在 - 为什么会发生这种情况？

似乎无法理解我哪里出错了。如果有人可以帮我告诉我我的错误，那将非常感激。

P.S。关于手头问题的一些背景信息 - 我正在尝试创建平等分配的包（相对平等的访问[1]和非访问[0]记录）用于分类目的。

R中的foreach和并行处理问题

0 个答案: