用mclapply控制种子

时间:2015-05-26 10:53:15

标签: r parallel-processing apply seeding random-seed

想象一下,我们正在做一些过程,我想在程序开头设置一个总体种子:例如。

mylist <- list( as.list(rep(NA,3)), as.list(rep(NA,3)) )
foo <- function(x){  for(i in 1:length(x)){ 
                       x[[i]] <- sample(100,1)
                         }
                      return(x) 
                     } 

# start block
set.seed(1)
l1 <- lapply(mylist, foo)
l2 <- lapply(mylist, foo)
# end

当然在一个区块l1l2内区别不一样,但如果我再次运行上述区块l1将与之前相同,l2将是和以前一样。

想象一下foo非常费时,所以我想使用mclapply而不是lapply,所以我这样做:

library(parallel)

# start block
set.seed(1)
mclapply(mylist , foo,  mc.cores = 3)
mclapply(mylist , foo,  mc.cores = 3)
# end

如果我再次运行此块,下一次我会得到不同的结果。如何使用lapply但使用mclappy生成与使用mclapply设置一个整体种子相同的行为。我查看了set.seed(1) l1 <- mclapply(mylist , foo, mc.cores = 3, mc.set.seed=FALSE) l2 <- mclapply(mylist , foo, mc.cores = 3, mc.set.seed=FALSE) doc,但我不确定因为使用了:

l1

导致l2 public void onResults(Bundle results){ ... //this is your predefined name synonyms List<String> johnnyNameList = (ArrayList<String>); johnnyNameList.add("johnny"); johnnyNameList.add("jonny"); johnnyNameList.add("johnnie"); List<String> katieNameList = (ArrayList<String>); katieNameList.add("katy"); katieNameList.add("katie"); katieNameList.add("kattie"); Map<String, List<String>> namesMap = new HashMap<String, List<String>>(); namesMap.put(johnny, johnnyNameList); namesMap.put(katie, katieNameList); ... List<String> recognitionResult = (ArrayList<String>) results.get(RESULTS_RECOGNITION); for (String resultItem : recognitionResult) { //brake into pieces the results String[] messagePieces = resultItem.split(","); //you process in case the length of the result fits if(messagePieces.length == 4){ //compare predefined name collection with the result first element //A) solution for (String name : johnnyNameList) { if(name.trim().toLowerCase().equals(messagePieces[0])){ //you have a name match } } //B) solution process namesMap which contains all names ... //you can compare the other parts of the result //remove first element of the result for (String piece : messagePieces) { //count matches on the other words } } } 相同,这不是我想要的......

1 个答案:

答案 0 :(得分:6)

parallel软件包特别支持“L'Ecuyer-CMRG”随机数生成器,该生成器与parallel同时引入。您可以使用以下方式阅读该支持的文档:

library(parallel)
?mc.reset.stream

要使用它,首先需要启用“L'Ecuyer-CMRG”:

RNGkind("L'Ecuyer-CMRG")

完成后,代码如:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
mclapply(mylist, foo, mc.cores=3)

将是可重现的,但对mclapply的两次调用将返回相同的结果。这是因为调用mclapply时不会更改主进程中随机数生成器的状态。

我使用以下函数来跳过mclapply工作人员使用的随机数流:

skip.streams <- function(n) {
  x <- .Random.seed
  for (i in seq_len(n))
    x <- nextRNGStream(x)
  assign('.Random.seed', x, pos=.GlobalEnv)
}

您可以使用此功能来获取我认为您想要的行为:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)