Python多处理PicklingError:不能pickle <type'function'=“”> </type>

时间:2012-01-10 14:28:26

标签: python multiprocessing pickle

很抱歉,我无法用更简单的示例重现错误,而且我的代码太复杂而无法发布。如果我在IPython shell而不是常规Python中运行程序,那么事情就会很顺利。

我查看了之前关于这个问题的一些说明。它们都是由在类函数中定义的pool to call函数引起的。但对我来说情况并非如此。

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

我将不胜感激任何帮助。

更新:我挑选的功能是在模块的顶层定义的。虽然它调用包含嵌套函数的函数。即,f()调用g()调用具有嵌套函数h()的{​​{1}},并且我调用i()pool.apply_async(f)f()g()都是在顶级定义的。我用这个模式尝试了更简单的例子,但它确实有效。

8 个答案:

答案 0 :(得分:241)

这是list of what can be pickled。特别是,如果函数在模块的顶层定义,则它们只能被选择。

这段代码:

import multiprocessing as mp

class Foo():
    @staticmethod
    def work(self):
        pass

if __name__ == '__main__':   
    pool = mp.Pool()
    foo = Foo()
    pool.apply_async(foo.work)
    pool.close()
    pool.join()

产生的错误几乎与您发布的错误相同:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

问题是pool方法都使用queue.Queue将任务传递给工作进程。通过queue.Queue的所有内容都必须是可选的,foo.work不可选,因为它未在模块的顶层定义。

可以通过在顶层定义一个函数来修复它,该函数调用foo.work()

def work(foo):
    foo.work()

pool.apply_async(work,args=(foo,))

请注意foo是可摘的,因为Foo是在顶层定义的,而foo.__dict__是可选的。

答案 1 :(得分:72)

我使用的是pathos.multiprocesssing,而不是multiprocessingpathos.multiprocessing是使用multiprocessing的{​​{1}}的分支。 dill可以在python中序列化几乎所有内容,因此您可以并行发送更多内容。 dill fork也可以直接使用多个参数函数,就像你需要类方法一样。

pathos

在此处获取>>> from pathos.multiprocessing import ProcessingPool as Pool >>> p = Pool(4) >>> class Test(object): ... def plus(self, x, y): ... return x+y ... >>> t = Test() >>> p.map(t.plus, x, y) [4, 6, 8, 10] >>> >>> class Foo(object): ... @staticmethod ... def work(self, x): ... return x+1 ... >>> f = Foo() >>> p.apipe(f.work, f, 100) <processing.pool.ApplyResult object at 0x10504f8d0> >>> res = _ >>> res.get() 101 (如果您愿意,pathos):  https://github.com/uqfoundation

答案 2 :(得分:23)

正如其他人所说multiprocessing只能将Python对象转移到可以被腌制的工作进程。如果您无法按照unutbu的描述重新组织代码,则可以使用dill扩展的pickle / unpickling功能来传输数据(尤其是代码数据),如下所示。

此解决方案仅需安装dill而不需要安装pathos其他库:

import os
from multiprocessing import Pool

import dill


def run_dill_encoded(payload):
    fun, args = dill.loads(payload)
    return fun(*args)


def apply_async(pool, fun, args):
    payload = dill.dumps((fun, args))
    return pool.apply_async(run_dill_encoded, (payload,))


if __name__ == "__main__":

    pool = Pool(processes=5)

    # asyn execution of lambda
    jobs = []
    for i in range(10):
        job = apply_async(pool, lambda a, b: (a, b, a * b), (i, i + 1))
        jobs.append(job)

    for job in jobs:
        print job.get()
    print

    # async execution of static method

    class O(object):

        @staticmethod
        def calc():
            return os.getpid()

    jobs = []
    for i in range(10):
        job = apply_async(pool, O.calc, ())
        jobs.append(job)

    for job in jobs:
        print job.get()

答案 3 :(得分:15)

我发现通过尝试在其上使用分析器,我也可以在一段完美的代码上生成完全错误的输出。

请注意,这是在Windows上(分叉不太优雅)。

我在跑步:

python -m profile -o output.pstats <script> 

发现删除分析删除了错误并将分析恢复了。因为我知道过去常用的代码,所以也让我感到沮丧。我正在检查是否有什么东西更新了pool.py ...然后有一种下沉的感觉并取消了剖析,就是这样。

在此处发布档案,以防其他任何人遇到档案。

答案 4 :(得分:4)

  

此解决方案仅需要安装dill而不需要其他库作为pathos

def apply_packed_function_for_map((dumped_function, item, args, kwargs),):
    """
    Unpack dumped function as target function and call it with arguments.

    :param (dumped_function, item, args, kwargs):
        a tuple of dumped function and its arguments
    :return:
        result of target function
    """
    target_function = dill.loads(dumped_function)
    res = target_function(item, *args, **kwargs)
    return res


def pack_function_for_map(target_function, items, *args, **kwargs):
    """
    Pack function and arguments to object that can be sent from one
    multiprocessing.Process to another. The main problem is:
        «multiprocessing.Pool.map*» or «apply*»
        cannot use class methods or closures.
    It solves this problem with «dill».
    It works with target function as argument, dumps it («with dill»)
    and returns dumped function with arguments of target function.
    For more performance we dump only target function itself
    and don't dump its arguments.
    How to use (pseudo-code):

        ~>>> import multiprocessing
        ~>>> images = [...]
        ~>>> pool = multiprocessing.Pool(100500)
        ~>>> features = pool.map(
        ~...     *pack_function_for_map(
        ~...         super(Extractor, self).extract_features,
        ~...         images,
        ~...         type='png'
        ~...         **options,
        ~...     )
        ~... )
        ~>>>

    :param target_function:
        function, that you want to execute like  target_function(item, *args, **kwargs).
    :param items:
        list of items for map
    :param args:
        positional arguments for target_function(item, *args, **kwargs)
    :param kwargs:
        named arguments for target_function(item, *args, **kwargs)
    :return: tuple(function_wrapper, dumped_items)
        It returs a tuple with
            * function wrapper, that unpack and call target function;
            * list of packed target function and its' arguments.
    """
    dumped_function = dill.dumps(target_function)
    dumped_items = [(dumped_function, item, args, kwargs) for item in items]
    return apply_packed_function_for_map, dumped_items

它也适用于numpy数组。

答案 5 :(得分:1)

Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

如果在传递给异步作业的模型对象中有任何内置函数,也会出现此错误。

因此,请务必检查传递的模型对象是否具有内置函数。 (在我们的例子中,我们在模型中使用FieldTracker() module Spree class Calculator::CustomCalculator < Calculator def self.description # Human readable description of the calculator end def compute(object=nil) p "Test" 10.0 # Returns the value after performing the required calculation end end end 函数来跟踪某个字段。以下是相关GitHub问题的django-model-utils

答案 6 :(得分:1)

以@rocksportrocker解决方案为基础,  发送和接收结果时莳萝是有意义的。

import dill
import itertools
def run_dill_encoded(payload):
    fun, args = dill.loads(payload)
    res = fun(*args)
    res = dill.dumps(res)
    return res

def dill_map_async(pool, fun, args_list,
                   as_tuple=True,
                   **kw):
    if as_tuple:
        args_list = ((x,) for x in args_list)

    it = itertools.izip(
        itertools.cycle([fun]),
        args_list)
    it = itertools.imap(dill.dumps, it)
    return pool.map_async(run_dill_encoded, it, **kw)

if __name__ == '__main__':
    import multiprocessing as mp
    import sys,os
    p = mp.Pool(4)
    res = dill_map_async(p, lambda x:[sys.stdout.write('%s\n'%os.getpid()),x][-1],
                  [lambda x:x+1]*10,)
    res = res.get(timeout=100)
    res = map(dill.loads,res)
    print(res)

答案 7 :(得分:1)

multiprocessing出现此问题时,一个简单的解决方案是从Pool切换到ThreadPool。除import-

外,无需更改任何其他代码即可完成此操作
from multiprocessing.pool import ThreadPool as Pool

之所以可行,是因为ThreadPool与主线程共享内存,而不是创建新进程-这意味着不需要酸洗。