Ubuntu R ForEach / DoMC不使用多核

时间:2012-08-18 23:10:40

标签: r ubuntu foreach domc

我在R中构建了一个函数(运行在Ubuntu 12.04 LTS 64bit,4核i7服务器,多线程和6gb ram),我使用标准软件包安装了R:

sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc 

注意:我还安装了foreach& R内部doMC(也没有帮助),就像我安装了deldir包一样:

install.packages(c("deldir"), dependencies = TRUE)

我的功能运行良好,但它不使用并行内核(只需8个中的1个):

library(deldir)
library(foreach)
library(doMC)
registerDoMC(cores=8)

#getDoParWorkers()
#getDoParName()
#getDoParVersion()

# loop through files
inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt')
for( inputfilenr in 1:length(inputfiles))
{
# set file variables    
curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL)
print (curinputfile)
curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL)
# select the point x/y coordinates into a data frame...
points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE)
# set calculation variables, precision on 3 digits only because of the RDW coordinate system
voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
tiles = tile.list(voro)
poly = array()
# start loop
  poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar% 
    {
    # load tile info
    tile = tiles[[i]]
    # start with EWKB notation
    curpoly = "POLYGON(("
    # add list of coordinates by looping through the points in tile
    for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) }
    # then again the first point to close the polygon and end the EWKB notation, adding that to the poly array
    sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
    }
write.csv(t(poly), file = curoutputfile, row.names = FALSE) 
}

所以结果很好,但没有并行性......

doMC确实正确注册:

> getDoParWorkers()
[1] 8
> getDoParName()
[1] "doMC"
> getDoParVersion()
[1] "1.2.5"

如果我查看用法(使用top):

top - 01:03:19 up 9 min,  3 users,  load average: 1.02, 0.86, 0.45
Tasks: 131 total,   2 running, 127 sleeping,   0 stopped,   2 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6104932k total,  1240512k used,  4864420k free,    16656k buffers
Swap:  6283260k total,        0k used,  6283260k free,   141996k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1553 zzzzzzzz  20   0  913m 850m 3716 R  100 14.3   8:22.03 R

所以只需最多一个核心。有没有人知道什么可能导致foreach / doMC不使用多个核心?

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] doMC_1.2.5      multicore_0.1-7 iterators_1.0.6 foreach_1.4.0
[5] deldir_0.0-19

loaded via a namespace (and not attached):
[1] codetools_0.2-8

1 个答案:

答案 0 :(得分:1)

为问题添加可能的答案: 由于foreach / mc在计算机本身上工作(使用标准示例),它是特定的代码本身,并且voro = deldir部分可能占用时间,而不是它之后的循环。然而,这意味着需要调整deldir包。查看DelDir源代码,似乎我需要在代码中调整此代码段:

# Call the master subroutine to do the work:
repeat {
    tmp <- .Fortran(
            'master',
            x=as.double(x),
            y=as.double(y),
            sort=as.logical(sort),
            rw=as.double(rw),
            npd=as.integer(npd),
            ntot=as.integer(ntot),
            nadj=integer(tadj),
            madj=as.integer(madj),
            ind=integer(npd),
            tx=double(npd),
            ty=double(npd),
            ilist=integer(npd),
            eps=as.double(eps),
            delsgs=double(tdel),
            ndel=as.integer(ndel),
            delsum=double(ntdel),
            dirsgs=double(tdir),
            ndir=as.integer(ndir),
            dirsum=double(ntdir),
            nerror=integer(1),
            PACKAGE='deldir'
        )

我不确定如何将其格式化为可以与foreach一起使用的东西......