我有一个执行此操作的过程:
pydot.Dot(graph_type='digraph', simplify=False)
) SVG生成部分是最慢的部分,它称为/usr/bin/dot
,对于最复杂的图形,它甚至可能需要几分钟的时间。
所以我有一个绝妙的(!)想法可以使用多处理Pool
进行调查(这样我就可以限制并行处理的进行),但是我偶然发现了一些我无法真正理解的问题。
我写了一个简单的函数,它将执行并行任务:
def write_svg_graph(graph, outfile):
with open(outfile, 'wb') as f:
f.write(graph.create(format='svg'))
将我的(PyDot,)对存储在work
列表中,然后将mp设置为:
with mp.Pool(mp.cpu_count()-2) as p:
p.starmap(write_svg_graph, work)
问题从此开始;从同一脚本中依次运行此代码(因此只需调用write_svg_graph
)就可以了,但是使用mp可以得到:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "myscript.py", line 146, in write_svg_graph
f.write(graph.create(format='svg'))
File "/usr/lib/python3/dist-packages/pydot.py", line 1882, in create
prog = self.prog
AttributeError: 'Dot' object has no attribute 'prog'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "myscript.py", line 179, in <module>
p.starmap(write_svg_graph, work)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
AttributeError: 'Dot' object has no attribute 'prog'
(对我而言)毫无意义。所以我将功能更改为:
def write_svg_graph(graph, outfile):
graph.set_prog('/usr/bin/dot')
with open(outfile, 'wb') as f:
f.write(graph.create(format='svg'))
(明确设置prog
属性),但是:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "myscript.py", line 146, in write_svg_graph
f.write(graph.create(format='svg'))
File "/usr/lib/python3/dist-packages/pydot.py", line 1898, in create
for img in self.shape_files:
AttributeError: 'Dot' object has no attribute 'shape_files'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "myscript.py", line 179, in <module>
p.starmap(write_svg_graph, work)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
AttributeError: 'Dot' object has no attribute 'shape_files'
另一个财产不见了!但是如果我检查在write_svg_file
中收到的对象的类型,那就是PyDot。
是否有人知道如何通过多处理并行化从PyDot对象生成SVG?谢谢! :)
更新1 :经过一番研究,看来mp需要通过pickle
对对象的传递来传递对象,而Dot对象不是可腌的:
In [541]: g
Out[541]: <pydot.Dot at 0x7f9c29cd2358>
In [542]: g.prog
Out[542]: 'dot'
In [543]: gg = pickle.loads(pickle.dumps(g))
In [544]: gg == g
Out[544]: False
In [545]: gg.prog
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-545-4ab0fd4f9bc9> in <module>()
----> 1 gg.prog
AttributeError: 'Dot' object has no attribute 'prog'
我将如何为此修复pydot?
更新2 :我在https://github.com/pydot/pydot/issues/217报告了我的一些结论,我相信我们无法通过mp传递pydot,因为它们没有完成可腌制。如果mp支持使用dill
而不是pickle
来传递对象,那就太好了,但事实并非如此