我一直在玩multiprocessing.Pool
并尝试了解initializer
参数的确切运作方式。根据我的理解,为每个进程调用初始化函数,因此我假设它的参数(即initargs
)必须跨进程边界进行pickle。我知道池的map
方法也使用pickle作为参数,所以我假设任何作为初始化器的参数的东西也应该作为映射的参数。
然而,当我运行以下代码时,initialize
被调用就好了,但是map
抛出了一个关于无法挑选模块的异常。 (使用当前模块作为参数并没有什么特别之处;它只是第一个出现在脑中的非pickle对象。)有谁知道这种差异背后可能是什么?
from __future__ import print_function
import multiprocessing
import sys
def get_pid():
return multiprocessing.current_process().pid
def initialize(module):
print('Got module {} in PID {}'.format(module, get_pid()))
def worker(module):
print('Got module {} in PID {}'.format(module, get_pid()))
current_module = sys.modules[__name__]
work = [current_module]
print('Main process has PID {}'.format(get_pid()))
pool = multiprocessing.Pool(None, initialize, work)
pool.map(worker, work)
答案 0 :(得分:1)
初始化不需要腌制,但map
调用确实如此。也许这会有所帮助......(我在这里使用multiprocess
代替multiprocessing
来提供更好的酸洗和互动性。)
>>> from __future__ import print_function
>>> import multiprocess as multiprocessing
>>> import sys
>>>
>>> def get_pid():
... return multiprocessing.current_process().pid
...
>>>
>>> def initialize(module):
... print('Got module {} in PID {}'.format(module, get_pid()))
...
>>>
>>> def worker(module):
... print('Got module {} in PID {}'.format(module, get_pid()))
...
>>>
>>> current_module = sys.modules[__name__]
>>> work = [current_module]
>>>
>>> print('Main process has PID {}'.format(get_pid()))
Main process has PID 34866
>>> pool = multiprocessing.dummy.Pool(None, initialize, work)
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
Got module <module '__main__' (built-in)> in PID 34866
>>> pool.map(worker, work)
Got module <module '__main__' (built-in)> in PID 34866
[None]
冷却。线程pool
有效......(因为它不需要腌制任何东西)。我们何时使用序列化同时发送worker
和work
>>> pool = multiprocessing.Pool(None, initialize, work)
Got module <module '__main__' (built-in)> in PID 34875
Got module <module '__main__' (built-in)> in PID 34876
Got module <module '__main__' (built-in)> in PID 34877
Got module <module '__main__' (built-in)> in PID 34878
Got module <module '__main__' (built-in)> in PID 34879
Got module <module '__main__' (built-in)> in PID 34880
Got module <module '__main__' (built-in)> in PID 34881
Got module <module '__main__' (built-in)> in PID 34882
>>> pool.map(worker, work)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 567, in get
raise self._value
NotImplementedError: pool objects cannot be passed between processes or pickled
>>>
让我们来看看酸洗work
:
>>> import pickle
>>> import sys
>>> pickle.dumps(sys.modules[__name__])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle module objects
>>>
所以,你不能挑剔一个模块......好吧,我们能用dill
做得更好吗?
>>> import dill
>>> dill.detect.trace(True)
>>> dill.pickles(work)
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x10c017cf8>
# F2
D2: <dict object at 0x10d9a8168>
M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/__init__.pyc'>
# M2
F1: <function worker at 0x10c07fed8>
F2: <function _create_function at 0x10c017488>
# F2
Co: <code object worker at 0x10b053cb0, file "<stdin>", line 1>
F2: <function _unmarshal at 0x10c017320>
# F2
# Co
D1: <dict object at 0x10af68168>
# D1
D2: <dict object at 0x10c0e4a28>
# D2
# F1
M2: <module 'sys' (built-in)>
# M2
F1: <function initialize at 0x10c07fe60>
Co: <code object initialize at 0x10b241f30, file "<stdin>", line 1>
# Co
D1: <dict object at 0x10af68168>
# D1
D2: <dict object at 0x10c0ea398>
# D2
# F1
M2: <module 'pathos' from '/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2a1.dev0-py2.7.egg/pathos/__init__.pyc'>
# M2
C2: __future__._Feature
# C2
D2: <dict object at 0x10b05b7f8>
# D2
M2: <module 'multiprocess' from '/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/__init__.pyc'>
# M2
T4: <class 'pathos.threading.ThreadPool'>
# T4
D2: <dict object at 0x10c0ea5c8>
# D2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 1209, in pickles
pik = copy(obj, **kwds)
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 161, in copy
return loads(dumps(obj, *args, **kwds))
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 197, in dumps
dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 190, in dump
pik.dump(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 636, in _batch_appends
save(tmp[0])
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 1116, in save_module
state=_main_dict)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 419, in save_reduce
save(state)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 768, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
save(v)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 452, in __reduce__
'pool objects cannot be passed between processes or pickled'
NotImplementedError: pool objects cannot be passed between processes or pickled
>>>
答案是是 - 模块开始发泡,但由于模块中的内容而失败...所以看起来它适用于__main__
中的所有内容,除非是<{1}}中pool
的一个实例 - 然后就会失败。
因此,如果你的最后两行代码被替换为这一行,它将起作用:
__main__
这是使用>>> multiprocessing.Pool(None, initialize, work).map(worker, work)
Got module <module '__main__' (built-in)> in PID 34922
Got module <module '__main__' (built-in)> in PID 34923
Got module <module '__main__' (built-in)> in PID 34924
Got module <module '__main__' (built-in)> in PID 34925
Got module <module '__main__' (built-in)> in PID 34926
Got module <module '__main__' (built-in)> in PID 34927
Got module <module '__main__' (built-in)> in PID 34928
Got module <module '__main__' (built-in)> in PID 34929
Got module <module '__main__' (built-in)> in PID 34922
[None]
>>>
,因为它使用了multiprocess
。 dill
仍然无法在这里腌制,因为pickle
无法序列化模块。需要序列化,因为必须将对象发送到另一个进程上的另一个python实例。