Question

我正在使用ReliefF进行功能选择（使用名为“ CORElearn”的程序包）。以前效果很好。但是稍后，我想加快我的代码。由于我的代码中包含引导程序（每个循环都在做完全相同的事情，包括使用ReliefF），因此我将包“ parallel”用于并行计算。但是我意识到，每当涉及到ReliefF部分时，代码就只会停留在那里。

相关代码如下：

num.round <- 10  # number of rounds for bootstrap
rounds.btsp <- seq(1, num.round)  # sequence of numbers for bootstrap, used for parallel computing

boot.strap <- function(round.btsp) {

    ## some codes using other feature selection methods

    print('Finish feature selection using other methods')  # I can get this output

    # use ReliefF to rank the features
    data.ref <- data.frame(t(x.train.resample), y.train.resample, check.names = F)  # add the param to avoid changing '-' to '.'
    print('Start using attrEval')  # I’ll get this output, but then I'll get stuck here
    estReliefF <- attrEval('y.train.resample', data.ref, estimator = 'ReliefFexpRank', ReliefIterations = 30)
    names(estReliefF) <- fea.name  # This command needs to be added because it's very annoying that 'attrEval' will change the '-' in the names to '.'
    print('Start using estReliefF')  # I’ll never get here
    fea.rank.ref <- estReliefF[order(abs(estReliefF), decreasing = T)]
    fea.rank.ref <- data.frame(importance = fea.rank.ref)
    fea.rank.name.ref <- rownames(fea.rank.ref)  # the ranked feature list for this round

    return(fea.rank.name.ref)
}

results.btsp <- mclapply(rounds.btsp, boot.strap, mc.cores = num.round)

我现在在想的是函数“ attrEval”将使用多个内核进行并行计算（我在文档中读到了https://cran.r-project.org/web/packages/CORElearn/CORElearn.pdf）。然后，将与我使用的并行设备发生某种冲突。当我将“ num.round”更改为1时，运行代码就没有问题（但是即使我将其设置为2，也无法使用）。

我正在使用的服务器具有80个内核。

有没有办法解决这个问题？我在想关闭函数“ attrEval”的并行计算可能是一种解决方案？即使我不怎么做~~~

Answer 1

具有多个级别的并行性可能很棘手。不幸的是CORElearn包不允许直接操纵使用的线程数。由于它使用OpenMP进行并行执行，因此您可以尝试适当地设置环境变量OMP_NUM_THREADS，例如

Sys.setenv(OMP_NUM_THREADS = 8)
num.round <- 10

这样一来，应该有10组8个核心，每个组处理一个自举回合。

Answer 2

从软件包Marko的贡献者那里得到了一个解决方案：通过在'attrEval'中使用参数maxThreads = 1来禁用CORElearn中的多线程

在R

2 个答案: