Question

我想比较两种方法在同一数据集上的性能。为了获得它们之间的多重比较，我正在使用Bootstrap，因此我认为使用并行计算可能是一个好主意。由于引导程序的数量为50，因此我分配了50个内核来完成这项工作。伪代码如下：

num.round <- 50  # number of bootstrap, which means I'll generate 50 subsets of the original dataset to do 50 comparison between the two methods
rounds.btsp <- seq(1, num.round)

BootStrap <- function(round.btsp) {
    result1 <- METHOD1(round.btsp)
    result2 <- METHOD2(round.btsp)
    return(list(result1 = result1, result2 = result2))
}

results.btsp <- mclapply(rounds.btsp, BootStrap, mc.cores = num.round)

for (round.btsp in rounds.btsp){
    result1 <- results.btsp[[round.btsp]]$result1
    result2 <- results.btsp[[round.btsp]]$result2
    COMPARE(result1, result2)  # do the comparison here, and this will be repeated 50 times
}

在“ COMPARE”的步骤中出现错误，当我查看它时，我发现，当round.btsp = 10时，result1或result2中没有任何内容。因此，我尝试将round.btsp设置为10并运行“ BootStrap”函数中的内容，但是一切正常。然后，我再次重复整个脚本，并且再次发生相同的错误。但是与上次不同的是，现在是round.btsp = 20（以10和20为例）。

我们的服务器上共有80个内核。但也有其他用户不时使用某些内核。

关于我所观察到的情况以及内核的状况，我猜测原因是当我需要50个内核，但有时我用不完时，某些线程将无法正常运行，因此我不会从该线程获得任何信息。

Answer 1

问题解决了。实际上，后来我发现问题出在COMPARE之上：实际上是在计算results.btsp.的步骤中，所以现在，我的解决方案是检查results.btsp中每个元素的长度，如果存在如果不满意，将重新计算results.btsp，这将再次运行并行计算。除非所有人都通过了检查，否则它将继续进行for循环。

使用mclapply在R上进行并行计算

1 个答案: