如果多次调用parLapply,可以只调用makeCluster
和stopCluster
一次,还是应该在每次parLapply
调用之前和之后调用?这对内存使用有何影响?
这是一个玩具示例:
library(parallel)
my_g1 <- function(list_element) {
return(sum(list_element))
}
my_g2 <- function(list_element, my_parameter) {
return(max(list_element) + my_parameter)
}
my_fn <- function(large_list, max_iterations=10, my_parameter=123) {
stopifnot(max_iterations >= 1)
iteration <- 1
while(TRUE) {
message("iteration ", iteration)
list_of_sums <- lapply(my_large_list, my_g1)
list_of_max_plus_parameter <- lapply(my_large_list, my_g2, my_parameter=my_parameter)
stopifnot(list_of_max_plus_parameter[[1]] == max(large_list[[1]]) + my_parameter)
## Pretend there's work to do with list_of*: check for convergence; if converged, break
iteration <- iteration + 1
if(iteration >= max_iterations) break
}
return(1) # Pretend this has something to do with the work done in the loop
}
my_large_list <- list(seq(1, 10),
seq(99, 157),
seq(27, 54),
seq(1001, 1041)) # Pretend this takes up lots of memory, want to avoid copying
unused <- my_fn(my_large_list)
现在假设我重写my_fn
以使用群集:
my_fn_parallelized <- function(large_list, max_iterations=10, my_parameter=123) {
stopifnot(max_iterations >= 1)
cluster <- makeCluster(2) # Two cores
iteration <- 1
while(TRUE) {
message("iteration ", iteration)
list_of_sums <- parLapply(cluster, my_large_list, my_g1)
list_of_max_plus_parameter <- parLapply(cluster, my_large_list, my_g2,
my_parameter=my_parameter)
stopifnot(list_of_max_plus_parameter[[1]] == max(large_list[[1]]) + my_parameter)
## Pretend there's work to do with list_of*: check for convergence; if converged, break
iteration <- iteration + 1
if(iteration >= max_iterations) break
}
stopCluster(cluster) # With stopCluster here, is my_large_list copied 2*max_iterations times?
return(1) # Pretend this has something to do with the work done in the loop
}
unused <- my_fn_parallelized(my_large_list)
在循环外部stopCluster
,my_large_list
是否会多次复制,在调用stopCluster
之前内存未释放?换句话说,my_large_list
的内存使用量是否为2*max_iterations
的顺序?或者它会在max_iterations
?