创建时使用命名空间进行Python多处理池交互

时间:2016-03-09 08:18:05

标签: python multiprocessing

我们知道multiprocessing.Pool必须在要运行的函数定义之后初始化。但是我发现下面的代码对我来说是不可理解的

import os
from multiprocessing import Pool

def func(i): print('first')

pool1 = Pool(2)
pool1.map(func, range(2))         #map-1

def func(i): print('second')
func2 = func

print('------')
pool1.map(func,  range(2))        #map-2
pool1.map(func2,  range(2))       #map-3

pool2 = Pool(2)
print('------')
pool2.map(func,   range(2))       #map-4
pool2.map(func2,  range(2))       #map-5

输出(linux上的python2.7和python3.4)是

first         #map-1
first
------
first         #map-2
first
first         #map-3
first
------
second        #map-4
second
second        #map-5
second
正如我们预期的那样,

map-2打印'first'。 但map-3如何找到名称func2?我的意思是pool1func2首次出现之前初始化。因此func2 = func确实已执行,而def func(i): print('second')则未执行。为什么?

如果我直接用

定义func2
def func2(i): print('second')

然后map-3找不到许多帖子提到的名称func2,例如。 this one。两个案例的区别是什么?

据我所知,参数通过酸洗传递给奴隶进程,但是 pool如何将被调用的函数传递给其他进程?或者子流程如何找到被调用的函数?

1 个答案:

答案 0 :(得分:1)

tl; dr map-3处调用第一个func的问题,当人们预期第二个func是由于Pool.map()func.__name__序列化为func并将其解析为func2即使已将其分配给func引用,并将其发送到子进程,查找import os from multiprocessing import Pool print(os.getpid(), 'parent') def func(i): print(os.getpid(), 'first', end=" | ") if 'func' in globals(): print(globals()['func'], end=" | ") else: print("no func in globals", end=" | ") if 'func2' in globals(): print(globals()['func2']) else: print("no func2 in globals") print('------ map-1') pool1 = Pool(2) pool1.map(func, range(2)) #map-1 def func(i): print(os.getpid(), 'second', end=" | ") if 'func' in globals(): print(globals()['func'], end=" | ") else: print("no func in globals", end=" | ") if 'func2' in globals(): print(globals()['func2']) else: print("no func2 in globals") func2 = func print('------ map-2') pool1.map(func, range(2)) #map-2 print('------ map-3') pool1.map(func2, range(2)) #map-3 pool2 = Pool(2) print('------ map-4') pool2.map(func, range(2)) #map-4 print('------ map-5') pool2.map(func2, range(2)) #map-5 本地到子进程。

好的,所以我可以计算下面列出的四个不同的问题,我认为你已经讲过命名空间和分叉过程,直接进入问题的乐趣☺

  

①但是map-3如何找到名称func2?

     

②因此确实执行了func2 = func,而def func(i):print('second')则没有。为什么?

     

③然后map-3将找不到许多帖子所提到的名称func2,例如。这个。两种情况有什么区别?

     

④据我所知,参数通过pickling传递给slave进程,但pool如何将被调用的函数传递给其他进程?或者子流程如何找到被调用的函数?

所以我添加了一些代码,以展示更多的内部结构:

21512 parent
------ map-1
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-2
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-3
21513 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
21514 first | <function func at 0x7f62d67f7cf8> | no func2 in globals
------ map-4
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
------ map-5
21518 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>
21519 second | <function func at 0x7f62d531bed8> | <function func at 0x7f62d531bed8>

在我的系统上输出:

pool1

因此,我们可以看到,对于func2,从未向命名空间添加multiprocessing。所以肯定会有一些可疑的事情发生,我现在为时已晚,无法彻底查看pickle的来源和调试器,以了解正在发生的事情。

因此,如果我必须猜测①的答案,func2模块会以某种方式发现0x7f62d531bed8已解析为func,其中已存在标记func因此,它在儿童方面腌制已知的“标签”0x7f62d67f7cf8,并将其解析为func2 → 0x7f62d531bed8 → func → [PICKLE] → globals()['func'] → 0x7f62d67f7cf8 。即:

func()

为了测试我的理论,我通过将第二个func2()重命名为------ map-3 Process PoolWorker-1: Process PoolWorker-2: Traceback (most recent call last): Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker task = get() task = get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get return recv() return recv() AttributeError: 'module' object has no attribute 'func2' AttributeError: 'module' object has no attribute 'func2' 来改变你的代码,这就是我得到的:

func = func2

然后将func2 = func更改为------ map-2 Process PoolWorker-1: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Process PoolWorker-2: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker task = get() task = get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get return recv() return recv() AttributeError: 'module' object has no attribute 'func2' AttributeError: 'module' object has no attribute 'func2'

pool.py

所以我相信我已经开始说明问题了。此外,它还显示了在儿童流程方面阅读代码以了解正在发生的事情的位置。

这样更多的线索可以回答②和③!

为了更进一步,我在 job, i, func, args, kwds = task print("XXX", os.getpid(), job, i, func, args, kwds) 第114行添加了一份打印声明:

func

显示正在发生的事情。我们可以看到0x7f2d0238fcf8已解析为23432 parent ------ map-1 ('XXX', 23433, 0, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ('XXX', 23434, 0, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ------ map-2 ('XXX', 23433, 1, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ('XXX', 23434, 1, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ------ map-3 ('XXX', 23433, 2, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (0,)),), {}) 23433 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ('XXX', 23434, 2, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x7f2d0238fcf8>, (1,)),), {}) 23434 first | <function func at 0x7f2d0238fcf8> | no func2 in globals ------ map-4 ('XXX', 23438, 3, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {}) 23438 second | <function func at 0x1092e60> | <function func at 0x1092e60> ('XXX', 23439, 3, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {}) 23439 second | <function func at 0x1092e60> | <function func at 0x1092e60> ------ map-5 ('XXX', 23438, 4, 0, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (0,)),), {}) ('XXX', 23439, 4, 1, <function mapstar at 0x7f2d02363230>, ((<function func at 0x1092e60>, (1,)),), {}) 23438 second | <function func at 0x1092e60> | <function func at 0x1092e60> 23439 second | <function func at 0x1092e60> | <function func at 0x1092e60> ,这与父函数中的地址相同:

>>> print(func)
<function func at 0x7fc6174e3ed8>

因此,为了回答④,我们需要在多处理源中进一步挖掘,甚至可能在pickle源中。

但我想我对分辨率的看法可能是正确的...... 然后唯一剩下的问题是为什么它会将标签解析为地址并再次返回标签,然后再将其推送到子流程!

编辑:我想我知道为什么!当我上床睡觉时,原因突然出现在脑海中,所以我回到了键盘上:

当pickle函数时,pickles接受包含该函数的参数,并从函数的对象本身获取其名称:

所以即使你确实创建了一个新的函数对象,你也会在内存中获得不同的地址:

func.__name__

泡菜并不关心,因为如果孩子已经无法访问该功能,那么它将永远不会被访问。所以pickle只能解析>>> print("func.__name__:", func.__name__) func.__name__: func >>> print("func2.__name__:", func2.__name__) func2.__name__: func

func

然后,即使您在父线程上更改了函数的主体,并且您对该函数进行了新的引用,实际得到的是函数的内部名称,该函数在分配lambda时给出,或者功能已定义。

这解释了当您在func2阶段向pool1提供map-3时获得旧map-3功能的原因。

因此,作为结论,①func2找不到名称func,它会在func2引用的函数中找到名称func。所以,这也回答②&amp; ③,因为找到的func正在执行原始的func.__name__函数。而机制是,pickle._Pickler.save_global被用来挑选和解析两个进程之间的函数名称,回答④。

上次更新,来自您:

if name is None: name = getattr(obj, '__qualname__', None) 中,它使用

获取名称
if name is None: name = obj.__name__. 
然后再次

__qualname__

因此,如果obj没有__name__,那么将使用if obj2 is not obj: raise PicklingError(...)

  

但是它会检查传递的对象是否与子进程中的对象相同:

obj2, parent = _getattribute(module, name)

其中func()

是的,但请记住,传递的对象只是函数的(内部)名称,而不是函数本身。子进程有没有方式来查明他的func()是否与内存中父代import os from multiprocessing import Pool print(os.getpid(), 'parent') def func(i): print(os.getpid(), 'first', end=" | ") if 'func' in globals(): print(globals()['func'], end=" | ") else: print("no func in globals", end=" | ") if 'func2' in globals(): print(globals()['func2']) else: print("no func2 in globals") print('------ map-1') pool1 = Pool(2) pool1.map(func, range(2)) #map-1 def func2(i): print(os.getpid(), 'second', end=" | ") if 'func' in globals(): print(globals()['func'], end=" | ") else: print("no func in globals", end=" | ") if 'func2' in globals(): print(globals()['func2']) else: print("no func2 in globals") func2.__qualname__ = func.__qualname__ func = func2 print('------ map-2') pool1.map(func, range(2)) #map-2 print('------ map-3') pool1.map(func2, range(2)) #map-3 pool2 = Pool(2) print('------ map-4') pool2.map(func, range(2)) #map-4 print('------ map-5') pool2.map(func2, range(2)) #map-5 相同。

从@SyrtisMajor编辑:

好的,让我们改变上面的第一个代码:

38130 parent
------ map-1
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-2
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-3
38131 first | <function func at 0x101856f28> | no func2 in globals
38132 first | <function func at 0x101856f28> | no func2 in globals
------ map-4
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>
------ map-5
38133 second | <function func at 0x10339b510> | <function func at 0x10339b510>
38134 second | <function func at 0x10339b510> | <function func at 0x10339b510>

输出如下:

func = func2

它与我们的第一个输出完全相同。请注意func2定义后func2是关键,因为pickle会检查func(名称为__main__.func)是否与{{1}}相同。如果不是,则酸洗会失败。