很抱歉,我无法用更简单的示例重现错误,而且我的代码太复杂而无法发布。如果我在IPython shell而不是常规Python中运行程序,那么事情就会很顺利。
我查看了之前关于这个问题的一些说明。它们都是由在类函数中定义的pool to call函数引起的。但对我来说情况并非如此。
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
我将不胜感激任何帮助。
更新:我挑选的功能是在模块的顶层定义的。虽然它调用包含嵌套函数的函数。即,f()
调用g()
调用具有嵌套函数h()
的{{1}},并且我调用i()
。 pool.apply_async(f)
,f()
,g()
都是在顶级定义的。我用这个模式尝试了更简单的例子,但它确实有效。
答案 0 :(得分:241)
这是list of what can be pickled。特别是,如果函数在模块的顶层定义,则它们只能被选择。
这段代码:
import multiprocessing as mp
class Foo():
@staticmethod
def work(self):
pass
if __name__ == '__main__':
pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()
产生的错误几乎与您发布的错误相同:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
问题是pool
方法都使用queue.Queue
将任务传递给工作进程。通过queue.Queue
的所有内容都必须是可选的,foo.work
不可选,因为它未在模块的顶层定义。
可以通过在顶层定义一个函数来修复它,该函数调用foo.work()
:
def work(foo):
foo.work()
pool.apply_async(work,args=(foo,))
请注意foo
是可摘的,因为Foo
是在顶层定义的,而foo.__dict__
是可选的。
答案 1 :(得分:72)
我使用的是pathos.multiprocesssing
,而不是multiprocessing
。 pathos.multiprocessing
是使用multiprocessing
的{{1}}的分支。 dill
可以在python中序列化几乎所有内容,因此您可以并行发送更多内容。 dill
fork也可以直接使用多个参数函数,就像你需要类方法一样。
pathos
在此处获取>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool(4)
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
>>>
>>> class Foo(object):
... @staticmethod
... def work(self, x):
... return x+1
...
>>> f = Foo()
>>> p.apipe(f.work, f, 100)
<processing.pool.ApplyResult object at 0x10504f8d0>
>>> res = _
>>> res.get()
101
(如果您愿意,pathos
):
https://github.com/uqfoundation
答案 2 :(得分:23)
正如其他人所说multiprocessing
只能将Python对象转移到可以被腌制的工作进程。如果您无法按照unutbu的描述重新组织代码,则可以使用dill
扩展的pickle / unpickling功能来传输数据(尤其是代码数据),如下所示。
此解决方案仅需安装dill
而不需要安装pathos
其他库:
import os
from multiprocessing import Pool
import dill
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
return fun(*args)
def apply_async(pool, fun, args):
payload = dill.dumps((fun, args))
return pool.apply_async(run_dill_encoded, (payload,))
if __name__ == "__main__":
pool = Pool(processes=5)
# asyn execution of lambda
jobs = []
for i in range(10):
job = apply_async(pool, lambda a, b: (a, b, a * b), (i, i + 1))
jobs.append(job)
for job in jobs:
print job.get()
print
# async execution of static method
class O(object):
@staticmethod
def calc():
return os.getpid()
jobs = []
for i in range(10):
job = apply_async(pool, O.calc, ())
jobs.append(job)
for job in jobs:
print job.get()
答案 3 :(得分:15)
我发现通过尝试在其上使用分析器,我也可以在一段完美的代码上生成完全错误的输出。
请注意,这是在Windows上(分叉不太优雅)。
我在跑步:
python -m profile -o output.pstats <script>
发现删除分析删除了错误并将分析恢复了。因为我知道过去常用的代码,所以也让我感到沮丧。我正在检查是否有什么东西更新了pool.py ...然后有一种下沉的感觉并取消了剖析,就是这样。
在此处发布档案,以防其他任何人遇到档案。
答案 4 :(得分:4)
此解决方案仅需要安装dill而不需要其他库作为pathos
def apply_packed_function_for_map((dumped_function, item, args, kwargs),):
"""
Unpack dumped function as target function and call it with arguments.
:param (dumped_function, item, args, kwargs):
a tuple of dumped function and its arguments
:return:
result of target function
"""
target_function = dill.loads(dumped_function)
res = target_function(item, *args, **kwargs)
return res
def pack_function_for_map(target_function, items, *args, **kwargs):
"""
Pack function and arguments to object that can be sent from one
multiprocessing.Process to another. The main problem is:
«multiprocessing.Pool.map*» or «apply*»
cannot use class methods or closures.
It solves this problem with «dill».
It works with target function as argument, dumps it («with dill»)
and returns dumped function with arguments of target function.
For more performance we dump only target function itself
and don't dump its arguments.
How to use (pseudo-code):
~>>> import multiprocessing
~>>> images = [...]
~>>> pool = multiprocessing.Pool(100500)
~>>> features = pool.map(
~... *pack_function_for_map(
~... super(Extractor, self).extract_features,
~... images,
~... type='png'
~... **options,
~... )
~... )
~>>>
:param target_function:
function, that you want to execute like target_function(item, *args, **kwargs).
:param items:
list of items for map
:param args:
positional arguments for target_function(item, *args, **kwargs)
:param kwargs:
named arguments for target_function(item, *args, **kwargs)
:return: tuple(function_wrapper, dumped_items)
It returs a tuple with
* function wrapper, that unpack and call target function;
* list of packed target function and its' arguments.
"""
dumped_function = dill.dumps(target_function)
dumped_items = [(dumped_function, item, args, kwargs) for item in items]
return apply_packed_function_for_map, dumped_items
它也适用于numpy数组。
答案 5 :(得分:1)
Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
如果在传递给异步作业的模型对象中有任何内置函数,也会出现此错误。
因此,请务必检查传递的模型对象是否具有内置函数。 (在我们的例子中,我们在模型中使用FieldTracker()
module Spree
class Calculator::CustomCalculator < Calculator
def self.description
# Human readable description of the calculator
end
def compute(object=nil)
p "Test"
10.0
# Returns the value after performing the required calculation
end
end
end
函数来跟踪某个字段。以下是相关GitHub问题的django-model-utils。
答案 6 :(得分:1)
以@rocksportrocker解决方案为基础, 发送和接收结果时莳萝是有意义的。
import dill
import itertools
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
res = fun(*args)
res = dill.dumps(res)
return res
def dill_map_async(pool, fun, args_list,
as_tuple=True,
**kw):
if as_tuple:
args_list = ((x,) for x in args_list)
it = itertools.izip(
itertools.cycle([fun]),
args_list)
it = itertools.imap(dill.dumps, it)
return pool.map_async(run_dill_encoded, it, **kw)
if __name__ == '__main__':
import multiprocessing as mp
import sys,os
p = mp.Pool(4)
res = dill_map_async(p, lambda x:[sys.stdout.write('%s\n'%os.getpid()),x][-1],
[lambda x:x+1]*10,)
res = res.get(timeout=100)
res = map(dill.loads,res)
print(res)
答案 7 :(得分:1)
multiprocessing
出现此问题时,一个简单的解决方案是从Pool
切换到ThreadPool
。除import-
from multiprocessing.pool import ThreadPool as Pool
之所以可行,是因为ThreadPool与主线程共享内存,而不是创建新进程-这意味着不需要酸洗。