Question

在处理作为类的数据成员的函数时（由于Pickling问题），我知道多处理模块的various discussions {/ 3}}。

但是在多处理中是否存在另一个模块或任何类型的解决方案，它允许特定类似下面的内容（特别是不强制函数的定义并行应用于类外）？

class MyClass():

    def __init__(self):
        self.my_args = [1,2,3,4]
        self.output  = {}

    def my_single_function(self, arg):
        return arg**2

    def my_parallelized_function(self):
        # Use map or map_async to map my_single_function onto the
        # list of self.my_args, and append the return values into
        # self.output, using each arg in my_args as the key.

        # The result should make self.output become
        # {1:1, 2:4, 3:9, 4:16}


foo = MyClass()
foo.my_parallelized_function()
print foo.output

注意：我可以通过将my_single_function移到课堂外，并将foo.my_args之类的内容传递给map或map_async命令来轻松完成此操作。但是这推动了MyClass。

实例之外的函数的并行执行

对于我的应用程序（并行化大型数据查询，检索，连接和清理每月的数据横截面，然后将它们附加到这些横截面的长时间序列中），这一点非常重要。函数在类中，因为我的程序的不同用户将使用不同的时间间隔，不同的时间增量，要收集的不同数据子集等来实例化类的不同实例，这应该都是与该实例相关联。

因此，我希望并行化的工作也可以由实例来完成，因为它拥有与并行化查询相关的所有数据，并且尝试编写一些绑定到某些参数的hacky包装函数是愚蠢的。生活在课堂之外（特别是因为这样的功能不是一般的。它需要来自课堂内的所有细节。）

Answer 1

Steven Bethard has posted a way允许对方法进行pickle / unpickled。您可以像这样使用它：

import multiprocessing as mp
import copy_reg
import types

def _pickle_method(method):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    func_name = method.im_func.__name__
    obj = method.im_self
    cls = method.im_class
    cls_name = ''
    if func_name.startswith('__') and not func_name.endswith('__'):
        cls_name = cls.__name__.lstrip('_')
    if cls_name:
        func_name = '_' + cls_name + func_name
    return _unpickle_method, (func_name, obj, cls)

def _unpickle_method(func_name, obj, cls):
    # Author: Steven Bethard
    # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods
    for cls in cls.mro():
        try:
            func = cls.__dict__[func_name]
        except KeyError:
            pass
        else:
            break
    return func.__get__(obj, cls)

# This call to copy_reg.pickle allows you to pass methods as the first arg to
# mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in
# PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
# __builtin__.instancemethod failed

copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)

class MyClass(object):

    def __init__(self):
        self.my_args = [1,2,3,4]
        self.output  = {}

    def my_single_function(self, arg):
        return arg**2

    def my_parallelized_function(self):
        # Use map or map_async to map my_single_function onto the
        # list of self.my_args, and append the return values into
        # self.output, using each arg in my_args as the key.

        # The result should make self.output become
        # {1:1, 2:4, 3:9, 4:16}
        self.output = dict(zip(self.my_args,
                               pool.map(self.my_single_function, self.my_args)))

然后

pool = mp.Pool()   
foo = MyClass()
foo.my_parallelized_function()

产量

print foo.output
# {1: 1, 2: 4, 3: 9, 4: 16}

Answer 2

如果使用名为multiprocessing的{{1}}的分支，则可以在多处理的pathos.multiprocesssing函数中直接使用类和类方法。这是因为map代替dill或pickle，而cPickle可以在python中序列化几乎所有内容。

dill还提供异步映射函数......它可以pathos.multiprocessing具有多个参数的函数（例如map）

请参阅： What can multiprocessing and dill do together?

和： http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

map(math.pow, [1,2,3], [4,5,6])

所以你相信，你可以做你想做的事。

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]

在此处获取代码： https://github.com/uqfoundation/pathos

Answer 3

我相信有更优雅的解决方案。将以下行添加到与该类进行多处理的代码中，您仍然可以通过池传递该方法。代码应该高于班级

import copy_reg
    import types

    def _reduce_method(meth):
        return (getattr,(meth.__self__,meth.__func__.__name__))
    copy_reg.pickle(types.MethodType,_reduce_method)

有关如何挑选方法的更多信息，请参阅下文 http://docs.python.org/2/library/copy_reg.html

Python：从该类中对多个处理类的数据成员的函数进行多处理的高效解决方法

3 个答案: