无法通过多重处理Pool.starmap()传递PyDot对象?

时间:2019-09-27 21:00:46

标签: python python-3.x multiprocessing pydot

我有一个执行此操作的过程:

  • 生成几千个图形,作为PyDot对象(特别是pydot.Dot(graph_type='digraph', simplify=False)
  • 解析边缘和节点以增强一些信息(设置URL和类似的东西)
  • 为每个图形生成SVG图像

SVG生成部分是最慢的部分,它称为/usr/bin/dot,对于最复杂的图形,它甚至可能需要几分钟的时间。

所以我有一个绝妙的(!)想法可以使用多处理Pool进行调查(这样我就可以限制并行处理的进行),但是我偶然发现了一些我无法真正理解的问题。

我写了一个简单的函数,它将执行并行任务:

def write_svg_graph(graph, outfile):
    with open(outfile, 'wb') as f:
        f.write(graph.create(format='svg'))

将我的(PyDot,)对存储在work列表中,然后将mp设置为:

with mp.Pool(mp.cpu_count()-2) as p:
    p.starmap(write_svg_graph, work)

问题从此开始;从同一脚本中依次运行此代码(因此只需调用write_svg_graph)就可以了,但是使用mp可以得到:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "myscript.py", line 146, in write_svg_graph
    f.write(graph.create(format='svg'))
  File "/usr/lib/python3/dist-packages/pydot.py", line 1882, in create
    prog = self.prog
AttributeError: 'Dot' object has no attribute 'prog'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "myscript.py", line 179, in <module>
    p.starmap(write_svg_graph, work)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AttributeError: 'Dot' object has no attribute 'prog'

(对我而言)毫无意义。所以我将功能更改为:

def write_svg_graph(graph, outfile):
    graph.set_prog('/usr/bin/dot')
    with open(outfile, 'wb') as f:
        f.write(graph.create(format='svg'))

(明确设置prog属性),但是:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "myscript.py", line 146, in write_svg_graph
    f.write(graph.create(format='svg'))
  File "/usr/lib/python3/dist-packages/pydot.py", line 1898, in create
    for img in self.shape_files:
AttributeError: 'Dot' object has no attribute 'shape_files'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "myscript.py", line 179, in <module>
    p.starmap(write_svg_graph, work)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AttributeError: 'Dot' object has no attribute 'shape_files'

另一个财产不见了!但是如果我检查在write_svg_file中收到的对象的类型,那就是PyDot。

是否有人知道如何通过多处理并行化从PyDot对象生成SVG?谢谢! :)

更新1 :经过一番研究,看来mp需要通过pickle对对象的传递来传递对象,而Dot对象不是可腌的:

In [541]: g
Out[541]: <pydot.Dot at 0x7f9c29cd2358>

In [542]: g.prog
Out[542]: 'dot'

In [543]: gg = pickle.loads(pickle.dumps(g))

In [544]: gg == g
Out[544]: False

In [545]: gg.prog
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-545-4ab0fd4f9bc9> in <module>()
----> 1 gg.prog

AttributeError: 'Dot' object has no attribute 'prog'

我将如何为此修复pydot?

更新2 :我在https://github.com/pydot/pydot/issues/217报告了我的一些结论,我相信我们无法通过mp传递pydot,因为它们没有完成可腌制。如果mp支持使用dill而不是pickle来传递对象,那就太好了,但事实并非如此

0 个答案:

没有答案