Question

到目前为止，我通过将函数映射到使用函数map_sync(function, list)分发到各个集群的列表来并行化函数。

现在，我需要在字典的每个条目上运行一个函数。

map_sync似乎不适用于字典。我也尝试分散字典并使用装饰器并行运行该函数。然而，字典似乎也不适合散布。 是否有其他方法可以在字典上并行化函数而无需转换为列表？

这是我迄今为止的尝试：

from IPython.parallel import Client
rc = Client()
dview = rc[:]

test_dict = {'43':"lion", '34':"tiger", '343':"duck"}
dview.scatter("test",test)

dview["test"]
# this yields [['343'], ['43'], ['34'], []] on 4 clusters
# which suggests that a dictionary can't be scattered?

毋庸置疑，当我运行函数本身时，我收到一个错误：

@dview.parallel(block=True)
def run():
    for d,v in test.iteritems():
        print d,v

run()

AttributeError的
回溯（最近的呼叫最后一次）在（）在奔跑（字典） AttributeError：'str'对象没有属性'iteritems'

我不知道它是否相关，但我正在使用连接到Amazon AWS群集的IPython笔记本。

Answer 1

您可以使用以下内容对dict进行分散：

def scatter_dict(view, name, d):
    """partition a dictionary across the engines of a view"""
    ntargets = len(view)
    keys = d.keys() # list(d.keys()) in Python 3
    for i, target in enumerate(view.targets):
        subd = {}
        for key in keys[i::ntargets]:
            subd[key] = d[key]
        view.client[target][name] = subd

scatter_dict(dview, 'test', test_dict)

然后像往常一样远程操作它。

您还可以使用以下方法将远程序列再次收集到一个本地序列中：

def gather_dict(view, name):
    """gather dictionaries from a DirectView"""
    merged = {}
    for d in view.pull(name):
        merged.update(d)
    return merged

gather_dict(dv, 'test')

An example notebook

在IPython中对字典进行并行化

1 个答案: