Question

我正在尝试将一个函数“映射”到一个数组上。但是，在尝试简单和复杂功能时，并行版本总是比串行版本慢。如何在R？

中提高并行计算的性能

简单的并行示例：

library(parallel)

# Number of elements
arrayLength = 100
# Create data
input = 1:arrayLength

# A simple computation
foo = function(x, y) x^y - x^(y-1)

# Add complexity
iterations = 5 * 1000 * 1000

# Perform complex computation on each element
compute = function (x) {
  y = x
  for (i in 1:iterations) {
    x = foo(x, y)
  }
  return(x)
}

# Parallelized compute
computeParallel = function(x) {
  # Create a cluster with 1 fewer cores than are available.
  cl <- makeCluster(detectCores() - 1) # 8-1 cores 
  # Send static vars & funcs to all cores
  clusterExport(cl, c('foo', 'iterations'))
  # Map
  out = parSapply(cl, x, compute)
  # Clean up
  stopCluster(cl)
  return(out)
}

system.time(out <- compute(input)) # 12 seconds using 25% of cpu
system.time(out <- computeParallel(input)) # 160 seconds using 100% of cpu

Answer 1

问题在于您将所有的矢量化交换为并行化，这是一个糟糕的交易。你需要保持尽可能多的矢量化，以便有希望通过并行化来改善这类问题。

并行包中的pvec函数可以很好地解决这类问题，但在Windows上并不支持它。在Windows上运行的更通用的解决方案是将foreach与itertools包一起使用，该包包含对迭代各种对象有用的函数。这是一个使用＆＃34; isplitVector＆＃34;的示例。函数为每个工人创建一个子向量：

library(doParallel)
library(itertools)
cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl)
computeChunk <- function(x) {
  foreach(xc=isplitVector(x, chunks=getDoParWorkers()),
          .export=c('foo', 'iterations', 'compute'), 
          .combine='c') %dopar% {
    compute(xc)
  }
}

这仍然可能与纯矢量版本不能很好地比较，但它应该变得更好，因为＆＃34;迭代＆＃34;增加。除非＆＃34;迭代＆＃34;非常大。

Answer 2

parSapply将分别对input的每个元素运行该函数，这意味着您放弃了以向量化方式编写foo和compute所获得的速度。

pvec将通过块在多个核心上运行矢量化函数。试试这个：

system.time(out <- pvec(input, compute, mc.cores=4))

如何比串行版本更快地进行并行操作？

2 个答案: