我试图在我的多处理代码中使用一些闭包,并且它无缘无故地失败了。所以我做了一点测试:
#!/usr/bin/env python3
import functools
from multiprocessing import Pool
def processing_function(unprocessed_data):
return unprocessed_data
def callback_function(processed_data):
print("FUNCTION: " + str(processed_data))
def create_processing_closure(initial_data):
def processing_function(unprocessed_data):
return initial_data + unprocessed_data
return processing_function
def create_callback_closure():
def callback(processed_data):
print("CLOSURE: " + str(processed_data))
return callback
def create_processing_lambda(initial_data):
return lambda unprocessed_data: initial_data + unprocessed_data
def create_callback_lambda():
return lambda processed_data: print("LAMBDA: " + str(processed_data))
def processing_partial(unprocessed_data1, unprocessed_data2):
return (unprocessed_data1 + unprocessed_data2)
def callback_partial(initial_data, processed_data):
print("PARTIAL: " + str(processed_data))
pool = Pool(processes=1)
print("Testing if they work normally...")
f1 = processing_function
f2 = callback_function
f2(f1(1))
f3 = create_processing_closure(1)
f4 = create_callback_closure()
f4(f3(1))
f5 = create_processing_lambda(1)
f6 = create_callback_lambda()
f6(f5(1))
f7 = functools.partial(processing_partial, 1)
f8 = functools.partial(callback_partial, 1)
f8(f7(1))
# bonus round!
x = 1
f9 = lambda unprocessed_data: unprocessed_data + x
f10 = lambda processed_data: print("GLOBAL LAMBDA: " + str(processed_data))
f10(f9(1))
print("Testing if they work in apply_async...")
# works
pool.apply_async(f1, args=(1,), callback=f2)
# doesn't work
pool.apply_async(f3, args=(1,), callback=f4)
# doesn't work
pool.apply_async(f5, args=(1,), callback=f6)
# works
pool.apply_async(f7, args=(1,), callback=f8)
# doesn't work
pool.apply_async(f9, args=(1,), callback=f10)
pool.close()
pool.join()
结果是:
> ./apply_async.py
Testing if they work normally...
FUNCTION: 1
CLOSURE: 2
LAMBDA: 2
PARTIAL: 2
GLOBAL LAMBDA: 2
Testing if they work in apply_async...
FUNCTION: 1
PARTIAL: 2
任何人都可以解释这种奇怪的行为吗?
答案 0 :(得分:2)
因为这些对象无法转移到另一个进程;腌制callables只存储模块和名称,而不是对象本身。
partial
仅起作用,因为它共享底层函数对象,这是另一个全局函数。
请参阅pickle
模块文档的What can be pickled and unpickled section:
- 在模块顶层定义的函数(使用
def
,而不是lambda
)- 在模块顶层定义的内置函数
[...]
请注意,函数(内置和用户定义)由“完全限定”的名称引用而非值引用。 [2]这意味着只有函数名称被腌制,以及定义函数的模块的名称。函数的代码或其任何函数属性都不会被pickle。因此,定义模块必须可以在unpickling环境中导入,并且模块必须包含命名对象,否则将引发异常。 [3]
请注意multiprocessing
Programming guidelines:
Picklability
确保代理方法的参数是可选择的。
和
更好地继承而不是pickle / unpickle
使用 spawn 或 forkserver 启动方法时,
multiprocessing
中的许多类型都需要可选,以便子进程可以使用它们。但是,通常应避免使用管道或队列将共享对象发送到其他进程。相反,您应该安排程序,以便需要访问其他地方创建的共享资源的进程可以从祖先进程继承它。
如果你试图直接挑选每个可调用对象,你可以看到哪些可以被腌制恰好与使用多处理成功执行的callables一致:
>>> import pickle
>>> f2(f1(1))
FUNCTION: 1
>>> pickle.dumps([f1, f2]) is not None
True
>>> f4(f3(1))
CLOSURE: 2
>>> pickle.dumps([f3, f4]) is not None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: Can't pickle local object 'create_processing_closure.<locals>.processing_function'
>>> f6(f5(1))
LAMBDA: 2
>>> pickle.dumps([f5, f6]) is not None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: Can't pickle local object 'create_processing_lambda.<locals>.<lambda>'
>>> f8(f7(1))
PARTIAL: 2
>>> pickle.dumps([f7, f8]) is not None
True
>>> f10(f9(1))
GLOBAL LAMBDA: 2
>>> pickle.dumps([f9, f10]) is not None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <function <lambda> at 0x10994e8c8>: attribute lookup <lambda> on __main__ failed