How to use different random seeds for parallel instances of a reticulate function while using plyr on R

时间:2019-05-31 11:48:00

标签: python r parallel-processing plyr reticulate

I am trying to combine the parallelizing feature of plyr to call a python function via reticulate but there seems to be an identical seed used on the different instances.

on python:

# This is called python_script.py
import random
def give_a_rand():
   return(random.random())

on R

library(reticulate)
library(plyr)
library(doMC)
doMC::registerDoMC(cores=10)

reticulate::source_python('/path/to/python_script.py')

After loading libraries, registering cores for plyr and linking the python script to my R session via reticulate we can now call the python function give_a_rand() natively on R

> give_a_rand()
[1] 0.896585

We can use plyr to run it many times without parallelizing it:

> aaply(.data=1:10, .margins=1, .fun=function(x){give_a_rand()}, .parallel=F)
          1           2           3           4           5           6
0.183420430 0.539790166 0.817348174 0.130959177 0.143210990 0.794048321
          7           8           9          10
0.276724929 0.820918953 0.003462523 0.903942433

I guess that at some point I need to force the seed for the randomization engine in such a way that every instance has a different one. All is great so far ... but how to parallelize it?

aaply(.data=1:10, .margins=1, .fun=function(x){give_a_rand()}, .parallel=T)
       1        2        3        4        5        6        7        8
0.896585 0.896585 0.896585 0.896585 0.896585 0.896585 0.896585 0.896585
       9       10
0.896585 0.896585

1 个答案:

答案 0 :(得分:0)

好-基于this的答案,我修改了python函数,现在可以使用了:

import random

def seed_from_urandom():
    rand_int = 0
    f = open("/dev/urandom","rb")
    rnd_str = f.read(4)
    for c in rnd_str:
        rand_int <<= 8
        rand_int += int(c)
    return(int(rand_int))

def give_a_rand():
   random.seed(seed_from_urandom())
   return(random.random())