我是装饰师的新手,也许这比我第一个装饰项目的咀嚼更多,但我想做的是制作一个parallel
装饰器,它采用看起来像谦虚的功能适用于单个参数,并自动将其与multiprocessing
一起分发,并将其转换为适用于参数列表的函数。
我正在关注this very helpful answer之前的问题,所以我可以成功挑选类实例方法,我可以得到像那里的答案一样的例子。
这是我在并行装饰器上的第一次尝试(在咨询了线程装饰器的一些网页命中之后)。
###########
# Imports #
###########
import types, copy_reg, multiprocessing as mp
import pandas, numpy as np
### End Imports
##################
# Module methods #
##################
# Parallel decorator
def parallel(f):
def executor(*args):
_pool = mp.Pool(2)
_result = _pool.map_async(f, args[1:])
# I used args[1:] because the input will be a
# class instance method, so gotta skip over the self object.
# but it seems like there ought to be a better way...
_pool.close()
_pool.join()
return _result.get()
return executor
### End parallel
def _pickle_method(method):
func_name = method.im_func.__name__
obj = method.im_self
cls = method.im_class
cls_name = ''
if func_name.startswith('__') and not func_name.endswith('__'):
cls_name = cls.__name__.lstrip('_')
if cls_name:
func_name = '_' + cls_name + func_name
return _unpickle_method, (func_name, obj, cls)
### End _pickle_method
def _unpickle_method(func_name, obj, cls):
for cls in cls.mro():
try:
func = cls.__dict__[func_name]
except KeyError:
pass
else:
break
return func.__get__(obj, cls)
### End _unpickle_method
# This call to copy_reg.pickle allows you to pass methods as the first arg to
# mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in
# PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup
# __builtin__.instancemethod failed
copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
copy_reg.pickle(types.FunctionType, _pickle_method, _unpickle_method)
### End Module methods
##################
# Module classes #
##################
class Foo(object):
def __init__(self, args):
self.my_args = args
### End __init__
def squareArg(self, arg):
return arg**2
### End squareArg
def par_squareArg(self):
p = mp.Pool(2) # Replace 2 with the number of processors.
q = p.map_async(self.squareArg, self.my_args)
p.close()
p.join()
return q.get()
### End par_SquarArg
@parallel
def parSquare(self, num):
return self.squareArg(num)
### End parSquare
### End Foo
### End Module classes
###########
# Testing #
###########
if __name__ == "__main__":
myfoo = Foo([1,2,3,4])
print myfoo.par_squareArg()
print myfoo.parSquare(myfoo.my_args)
### End Testing
但是,当我使用这种方法时(愚蠢的尝试使用相同的_pickle_method
和_unpickle_method
进行强力手臂酸洗功能)我首先得到一个错误,AttributeError: 'function' object has no attribute 'im_func'
但更常见的是错误说功能无法被腌制。
所以问题是双重的。 (1)我如何修改装饰器,以便如果它所采用的f
对象是类的实例方法,那么它返回的executor
也是该类对象的实例方法(这样由于我可以腌制那些实例方法),这项关于不能发泡的业务不会发生? (2)创建附加_pickle_function
和_unpickle_function
方法是否更好?我认为Python可以腌制模块级函数,所以如果我的代码没有导致executor
成为实例方法,那么它似乎应该是一个模块级函数,但是为什么它不能被腌制?
答案 0 :(得分:3)
(1)我怎么能修改装饰器,这样如果它所采用的f对象是一个类的实例方法,那么它返回的执行器也是该类对象的实例方法(这样这个业务就不存在了)因为我可以腌制那些实例方法,所以能够发生泡菜吗?
>>> myfoo.parSquare
<bound method Foo.executor of <__main__.Foo object at 0x101332510>>
正如你所看到的,parSquare实际上是一个已成为实例方法的执行器,这并不奇怪,因为装饰器是一些函数包装器......
How to make a chain of function decorators?可能对装饰者有最好的描述。
(2)创建附加的_pickle_function和_unpickle_function方法是否更好?
你不需要python已经支持它们,事实上这个copy_reg.pickle(types.FunctionType, _pickle_method, _unpickle_method)
似乎有点奇怪,因为你使用相同的算法来腌制这两种类型。
现在更大的问题是为什么我们得到PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
错误本身看起来有些模糊,但看起来它无法查找某些内容,我们的功能?
我想最新的情况是,装饰器覆盖了函数,在您的案例中parSquare
内部定义的函数变为executor
但executor
是parallel
的内部函数。因此它不可导入,因此查找似乎失败了,这只是一种预感。
让我们尝试一个更简单的例子。
>>> def parallel(function):
... def apply(values):
... from multiprocessing import Pool
... pool = Pool(4)
... result = pool.map(function, values)
... pool.close()
... pool.join()
... return result
... return apply
...
>>> @parallel
... def square(value):
... return value**2
...
>>>
>>> square([1,2,3,4])
Exception in thread Thread-1:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
我们得到的错误几乎一样。
请注意,上面的代码相当于:
def parallel(function):
def apply(values):
from multiprocessing import Pool
pool = Pool(4)
result = pool.map(function, values)
pool.close()
pool.join()
return result
return apply
def square(value):
return value**2
square = parallel(square)
会产生相同的错误,同时请注意,如果我们不重命名我们的函数。
>>> def parallel(function):
... def apply(values):
... from multiprocessing import Pool
... pool = Pool(4)
... result = pool.map(function, values)
... pool.close()
... pool.join()
... return result
... return apply
...
>>> def _square(value):
... return value**2
...
>>> square = parallel(_square)
>>> square([1,2,3,4])
[1, 4, 9, 16]
>>>
它工作正常,我正在寻找一种方法来控制装饰器使用名称的方式,但无济于事,我仍然希望将它们与多处理一起使用,所以我提出了一个有点丑陋的工作:
>>> def parallel(function):
... def temp(_):
... def apply(values):
... from multiprocessing import Pool
... pool = Pool(4)
... result = pool.map(function, values)
... pool.close()
... pool.join()
... return result
... return apply
... return temp
...
>>> def _square(value):
... return value*value
...
>>> @parallel(_square)
... def square(values):
... pass
...
>>> square([1,2,3,4])
[1, 4, 9, 16]
>>>
所以基本上我将实际函数传递给装饰器然后我使用第二个函数来处理值,因为你可以看到它工作得很好。
我稍微修改了你的初始代码以更好地处理装饰器,尽管它并不完美。
import types, copy_reg, multiprocessing as mp
def parallel(f):
def executor(*args):
_pool = mp.Pool(2)
func = getattr(args[0], f.__name__) # This will get the actual method function so we can use our own pickling procedure
_result = _pool.map(func, args[1])
_pool.close()
_pool.join()
return _result
return executor
def _pickle_method(method):
func_name = method.im_func.__name__
obj = method.im_self
cls = method.im_class
cls_name = ''
if func_name.startswith('__') and not func_name.endswith('__'):
cls_name = cls.__name__.lstrip('_')
if cls_name:
func_name = '_' + cls_name + func_name
return _unpickle_method, (func_name, obj, cls)
def _unpickle_method(func_name, obj, cls):
func = None
for cls in cls.mro():
if func_name in cls.__dict__:
func = cls.__dict__[func_name] # This will fail with the decorator, since parSquare is being wrapped around as executor
break
else:
for attr in dir(cls):
prop = getattr(cls, attr)
if hasattr(prop, '__call__') and prop.__name__ == func_name:
func = cls.__dict__[attr]
break
if func == None:
raise KeyError("Couldn't find function %s withing %s" % (str(func_name), str(cls)))
return func.__get__(obj, cls)
copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
class Foo(object):
def __init__(self, args):
self.my_args = args
def squareArg(self, arg):
return arg**2
def par_squareArg(self):
p = mp.Pool(2) # Replace 2 with the number of processors.
q = p.map(self.squareArg, self.my_args)
p.close()
p.join()
return q
@parallel
def parSquare(self, num):
return self.squareArg(num)
if __name__ == "__main__":
myfoo = Foo([1,2,3,4])
print myfoo.par_squareArg()
print myfoo.parSquare(myfoo.my_args)
从根本上说,这仍然会失败,因为子进程试图调用该函数,所以给我们AssertionError: daemonic processes are not allowed to have children
,请记住,子进程并不真正复制代码只是名称......
import types, copy_reg, multiprocessing as mp
def parallel(f):
def temp(_):
def executor(*args):
_pool = mp.Pool(2)
func = getattr(args[0], f.__name__) # This will get the actual method function so we can use our own pickling procedure
_result = _pool.map(func, args[1])
_pool.close()
_pool.join()
return _result
return executor
return temp
def _pickle_method(method):
func_name = method.im_func.__name__
obj = method.im_self
cls = method.im_class
cls_name = ''
if func_name.startswith('__') and not func_name.endswith('__'):
cls_name = cls.__name__.lstrip('_')
if cls_name:
func_name = '_' + cls_name + func_name
return _unpickle_method, (func_name, obj, cls)
def _unpickle_method(func_name, obj, cls):
func = None
for cls in cls.mro():
if func_name in cls.__dict__:
func = cls.__dict__[func_name] # This will fail with the decorator, since parSquare is being wrapped around as executor
break
else:
for attr in dir(cls):
prop = getattr(cls, attr)
if hasattr(prop, '__call__') and prop.__name__ == func_name:
func = cls.__dict__[attr]
break
if func == None:
raise KeyError("Couldn't find function %s withing %s" % (str(func_name), str(cls)))
return func.__get__(obj, cls)
copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method)
class Foo(object):
def __init__(self, args):
self.my_args = args
def squareArg(self, arg):
return arg**2
def par_squareArg(self):
p = mp.Pool(2) # Replace 2 with the number of processors.
q = p.map(self.squareArg, self.my_args)
p.close()
p.join()
return q
def _parSquare(self, num):
return self.squareArg(num)
@parallel(_parSquare)
def parSquare(self, num):
pass
if __name__ == "__main__":
myfoo = Foo([1,2,3,4])
print myfoo.par_squareArg()
print myfoo.parSquare(myfoo.my_args)
[1, 4, 9, 16]
[1, 4, 9, 16]
最后一点,在多线程时要非常小心,这取决于你如何分割数据,你实际上可以比单线程更慢的多线程时间,主要是由于来回复制值以及创建和销毁子进程的开销。
始终对单/多线程进行基准测试,并在可能的情况下正确分割数据。
案例:
import numpy
import time
from multiprocessing import Pool
def square(value):
return value*value
if __name__ == '__main__':
pool = Pool(5)
values = range(1000000)
start = time.time()
_ = pool.map(square, values)
pool.close()
pool.join()
end = time.time()
print "multithreaded time %f" % (end - start)
start = time.time()
_ = map(square, values)
end = time.time()
print "single threaded time %f" % (end - start)
start = time.time()
_ = numpy.asarray(values)**2
end = time.time()
print "numpy time %f" % (end - start)
v = numpy.asarray(values)
start = time.time()
_ = v**2
end = time.time()
print "numpy without pre-initialization %f" % (end - start)
给我们:
multithreaded time 0.484441
single threaded time 0.196421
numpy time 0.184163
numpy without pre-initialization 0.004490
答案 1 :(得分:-1)
嗯,这不是你正在寻找的答案,但是Sage有一个@parallel
装饰器,就像你正在寻找的那样。您可以在线找到documentation和source code。
但是,作为一般规则,在您看到失败的行之前添加import pdb;pdb.set_trace()
并检查视线中的所有对象。如果您使用ipython
,则可以使用%pdb
魔术命令或执行along these lines。