将类传递给iPython Parallel

时间:2014-03-14 00:53:54

标签: parallel-processing ipython

我试图将一个类传递给iPython进行并行执行。实际上,此代码运行,但它加载了“时区”。每一次。这个类每个负载需要大约10个,所以这个开销是不可接受的,除非它只发生一次,或者每个核心发生一次。 我对并行化非常陌生,现在我想知道将导入移出函数。至少我认为这是正确的方法。

from IPython import parallel
clients = parallel.Client()
lview = clients.load_balanced_view()

lview.block = True

lats = [32.21, 34.98]
lons = [109.45, -102.4]
times = ['2014-03-12T16:20:44.000000000Z', '2014-03-12T15:48:52.000000000Z']

@lview.parallel()
def f(lats, lons, times):
    import sys,os
    sys.path.append("../utils/") # For grabbing 'Timezone'

    import Timezone as Timezone
    tz = Timezone.Timezone()

    # Use tz to compute local time
    a = tz.compute_local_time(lats, lons, times)

    return a

%time f.map(lats, lons, times)

结果:

in sync results <function __call__ at 0x105d2db18>
CPU times: user 700 ms, sys: 232 ms, total: 932 ms
Wall time: 11.6 s
Out[15]:
[('Asia/Chongqing', '2014-03-13 00:20:44'),
 ('America/Chicago', '2014-03-12 10:48:52')]

如果我将输入数据的长度加倍,结果会在时间上加倍(大约22秒)。 如何传入tz并让每个核心调用Timezone方法。

1 个答案:

答案 0 :(得分:1)

我明白了。我是怎么做到的 首先,我使用直接视图并将模块加载到每个核心,然后我使用scattergather来分解输入,最后使用map来访问数组/列出输入。

from IPython import parallel
from IPython import parallel as p

rc = p.Client()
rc[:].execute('import sys,os')
rc[:].execute('sys.path.append("../utils/")')
rc[:].execute('import Timezone as Timezone; tz = Timezone.Timezone()')

dview = rc[:] # A DirectView of all engines
dview.block = True

在下一个单元格中:

def f(v, lats, lons, times):
    v.scatter('lat', lats)
    v.scatter('lon', lons)
    v.scatter('time', times)
    v.execute("D=map(tz.compute_local_time, lat, lon, time)")
    return v.gather('D', block=True)

lats = [32.21]
lons = [109.45]
times = ['2014-03-12T16:20:44.000000000Z']

%time r = f(dview, lats, lons, times)

这给了我想要的输出,比使用时快了两倍:

map(tz.compute_local_time, lat, lon, time)