Question

我正在使用R和Rmpi中的并行处理工具对我大学的Unix集群进行分析。当我使用雪包或仅使用标准应用功能在我的本地（Windows）机器上运行下面显示的R脚本时，它非常慢，但似乎工作正常。当我使用Rmpi在Unix集群上运行它时，我收到错误“序列化太大而无法存储在原始向量中”。这个post似乎有点，但我来回传递的向量（50K）不应该接近触发这个错误。此外，我可以通过将数据分成更小的部分来使事情部分工作。我认为问题必定是与Rmpi相关的某种内存泄漏，但我没有调试这样的东西所需的工具。希望你至少可以指引我调试工具的方向。

我正在使用的数据有大约250万条记录，每条记录有大约40列。我向parApply提交了一个50K x 40的数据帧，它返回一个50K的数字向量。

#In an effort to avoid the serialization error I break into smaller pieces to avoid    
#problems with large vectors getting transferred between master and slave nodes
numBreaks<-floor(length(data[,1])/50000)
base<-min(length(data[,1]),50000)  #test data sets are < 50K and shouldn't 
                                   #crash the function    piece.in<-data[c(1:base),]         #this is the first batch of records I submit

status<-mpi.parApply(piece.in,1,test) #returns a vector of length 50K
#status<-apply(piece.in,1,test)   #commented out, function runs fine locally using                 
                                  #standard apply function
#cl <- makeSOCKcluster(c("localhost","localhost","localhost","localhost","localhost"))
#status<-parApply(cl,piece.in,1,test) #commented out, function runs fine locally
                                      #using snow's SOCK cluster and identical
                                      #parApply function
if(numBreaks>1){
for (i in 2:numBreaks){
  piece.in<-data[c((base+1):(base+50000)),]
  piece.out<-mpi.parApply(piece.in,1,test)  #returns a vector of length 50K

 #piece.out<-apply(piece.in,1,test)
 #piece.out<-parApply(cl,piece.in,1,test)
  status<-append(status,piece.out)          #add to existing vector
  base<-base+50000                          #increment base
  print(paste("clusters assigned to records",base,"through",(base+50000)))
 }
}

#This piece cleans up the residual records 
if((length(c(base:length(data[,1])))>1) & (base < length(data[,1]))){
  piece.in<-data[c((base+1):length(data[,1])),]
  #piece.out<-apply(piece.in,1,test)
  piece.out<-mpi.parApply(piece.in,1,test)
  #piece.out<-parApply(cl,piece.in,1,test)
  status<-append(status,piece.out)
  print(paste("Final: status has length",length(status)))
 }

此功能在测试数据集（~100K记录）上按预期工作。它使用雪或标准应用功能在本地工作（虽然它需要一天）。它在“序列化太大......”错误之前失败了，然后我把它分成了更小的部分。当我制作100K长的记录时，它失败了同样的错误。它成功运行（由写入文件的print语句记录）一直到迭代，覆盖记录130万到135万，记录以50K的批量提交，然后因序列化错误而失败。

我知道还有其他方法可以进行并行处理，（并且foreach命令是我希望我在这里使用的东西）但是由于对创建并行节点的计算环境一无所知而受到约束，所以我会如果可以的话，我想留在Rmpi。关于如何调试这个的任何提示？

很有责任。

克里斯

使用Rmpi时，如何调试/解决“序列化太大而无法存储在原始向量中”错误

0 个答案: