在map()
子类的实例列表中使用来自multiprocessing.Pool()
的{{1}}时,将删除自己类的新属性。
基于numpy docs subclassing example的以下最小示例再现了问题:
numpy.ndarray
删除属性from multiprocessing import Pool
import numpy as np
class MyArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def sum_worker(x):
return sum(x) , x.info
if __name__ == '__main__':
arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)]
with Pool() as p:
p.map(sum_worker, arr_list)
info
使用内置AttributeError: 'MyArray' object has no attribute 'info'
可以正常使用
map()
方法arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)]
list(map(sum_worker, arr_list2))
的目的是对象在切片后保留属性
__array_finalize__()
但是对于arr = MyArray([1,2,3], info='foo')
subarr = arr[:2]
print(subarr.info)
,这种方法在某种程度上不起作用......
答案 0 :(得分:2)
由于多处理使用pickle
将数据序列化到/来自不同的进程,因此这实际上是this question的副本。
根据该问题调整已接受的解决方案,您的示例将变为:
from multiprocessing import Pool
import numpy as np
class MyArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def __reduce__(self):
pickled_state = super(MyArray, self).__reduce__()
new_state = pickled_state[2] + (self.info,)
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.info = state[-1]
super(MyArray, self).__setstate__(state[0:-1])
def sum_worker(x):
return sum(x) , x.info
if __name__ == '__main__':
arr_list = [MyArray(np.random.rand(3), info=f'foo_{i}') for i in range(10)]
with Pool() as p:
p.map(sum_worker, arr_list)
注意,第二个答案表明您可以将pathos.multiprocessing
与未经适应的原始代码一起使用,因为路径使用dill
而不是pickle
。但是,当我测试它时,这不起作用。