Question

如果多次调用parLapply，可以只调用makeCluster和stopCluster一次，还是应该在每次parLapply调用之前和之后调用？这对内存使用有何影响？

这是一个玩具示例：

library(parallel)

my_g1 <- function(list_element) {
    return(sum(list_element))
}

my_g2 <- function(list_element, my_parameter) {
    return(max(list_element) + my_parameter)
}

my_fn <- function(large_list, max_iterations=10, my_parameter=123) {
    stopifnot(max_iterations >= 1)
    iteration <- 1
    while(TRUE) {
        message("iteration ", iteration)
        list_of_sums <- lapply(my_large_list, my_g1)
        list_of_max_plus_parameter <- lapply(my_large_list, my_g2, my_parameter=my_parameter)
        stopifnot(list_of_max_plus_parameter[[1]] == max(large_list[[1]]) + my_parameter)
        ## Pretend there's work to do with list_of*: check for convergence; if converged, break
        iteration <- iteration + 1
        if(iteration >= max_iterations) break
    }
    return(1)  # Pretend this has something to do with the work done in the loop
}

my_large_list <- list(seq(1, 10),
                      seq(99, 157),
                      seq(27, 54),
                      seq(1001, 1041))  # Pretend this takes up lots of memory, want to avoid copying

unused <- my_fn(my_large_list)

现在假设我重写my_fn以使用群集：

my_fn_parallelized <- function(large_list, max_iterations=10, my_parameter=123) {
    stopifnot(max_iterations >= 1)
    cluster <- makeCluster(2)  # Two cores
    iteration <- 1
    while(TRUE) {
        message("iteration ", iteration)
        list_of_sums <- parLapply(cluster, my_large_list, my_g1)
        list_of_max_plus_parameter <- parLapply(cluster, my_large_list, my_g2,
                                                my_parameter=my_parameter)
        stopifnot(list_of_max_plus_parameter[[1]] == max(large_list[[1]]) + my_parameter)
        ## Pretend there's work to do with list_of*: check for convergence; if converged, break
        iteration <- iteration + 1
        if(iteration >= max_iterations) break
    }
    stopCluster(cluster)  # With stopCluster here, is my_large_list copied 2*max_iterations times?
    return(1)  # Pretend this has something to do with the work done in the loop
}

unused <- my_fn_parallelized(my_large_list)

在循环外部stopCluster，my_large_list是否会多次复制，在调用stopCluster之前内存未释放？换句话说，my_large_list的内存使用量是否为2*max_iterations的顺序？或者它会在max_iterations？

方面保持不变

多次调用parLapply时的群集内存使用情况

0 个答案: