When to use parallel programming in R to apply a function to each row

时间:2015-06-15 14:56:40

标签: r parallel-processing plyr doparallel

There are many ways to apply a function to each row.

Here some methods that I know:

method 1

for (i in 1:nrow(data) ) { my_function(data[i,]) }

method 2

apply(data,1,my_function)

method 3

library(plyr)
adply(data,.margins=1, .fun=my_function)

method 4

library(doParallel)
nodes <- detectCores()
cl <- makeCluster(nodes)
registerDoParallel(cl)
clusterEvalQ(cl,source("my_fun.R"))
adply(data,.margins=1, .parallel = T, .fun=my_function)
stopCluster(cl)

among the top 3 methods, I think the faster is the third one. But the question is: when method 4 (the parallel one) is faster than method 3? there is a way to understand it before to run all the code?

0 个答案:

没有答案