Python 2.7.3
我有一个包含数千个数据文件的文件夹。每个数据文件都被送到构造函数并进行大量处理。现在我正在遍历文件并按顺序处理它们:
class Foo:
def __init__(self,file):
self.bar = do_lots_of_stuff_with_numpy_and_scipy(file)
def do_lots_of_stuff_with_numpy_and_scipy(file):
pass
def get_foos(dir):
return [Foo(os.path.join(dir,file)) for file in os.listdir(dir)]
这很好用,但速度很慢。我想同时这样做。我试过了:
def parallel_get_foos(dir):
p = Pool()
foos = p.map(Foo, [os.path.join(dir,file) for file in os.listdir(dir)])
p.close()
p.join()
return foos
if __name__ == "__main__":
foos = parallel_get_foos(sys.argv[1])
但它只是出错了很多:
Process PoolWorker-7:
Traceback (most recent call last):
File "/l/python2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/l/python2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/l/python2.7/lib/python2.7/multiprocessing/pool.py", line 99, in worker
put((job, i, result))
File "/l/python2.7/lib/python2.7/multiprocessing/queues.py", line 390, in put
return send(obj)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
我尝试过一个函数来返回对象,例如:
def get_foo(file):
return Foo(file)
def parallel_get_foos(dir):
...
foos = p.map(get_foo, [os.path.join(dir,file) for file in os.listdir(dir)])
...
但正如预期的那样我得到了同样的错误。
我已经阅读了大量类似的线程,试图解决类似这样的问题,但没有一个解决方案帮助了我。所以我感谢任何帮助!
编辑:
Bakuriu正确地推测我在do_lots_of_stuff方法中定义了一个非顶级函数。特别是,我的做法如下:
def fit_curve(data,degree):
"""Fits a least-square polynomial function to the given data."""
sorted = data[data[:,0].argsort()].T
coefficients = numpy.polyfit(sorted[0],sorted[1],degree)
def eval(val,deg=degree):
res = 0
for coefficient in coefficients:
res += coefficient*val**deg
deg -= 1
return res
return eval
有没有让这个功能腌制?
答案 0 :(得分:1)
你在做什么(至少,你在例子中显示的内容),实际上工作正常:
$mkdir TestPool
$cd TestPool/
$for i in {1..100}
> do
> touch "test$i"
> done
$ls
test1 test18 test27 test36 test45 test54 test63 test72 test81 test90
test10 test19 test28 test37 test46 test55 test64 test73 test82 test91
test100 test2 test29 test38 test47 test56 test65 test74 test83 test92
test11 test20 test3 test39 test48 test57 test66 test75 test84 test93
test12 test21 test30 test4 test49 test58 test67 test76 test85 test94
test13 test22 test31 test40 test5 test59 test68 test77 test86 test95
test14 test23 test32 test41 test50 test6 test69 test78 test87 test96
test15 test24 test33 test42 test51 test60 test7 test79 test88 test97
test16 test25 test34 test43 test52 test61 test70 test8 test89 test98
test17 test26 test35 test44 test53 test62 test71 test80 test9 test99
$vi test_pool_dir.py
$cat test_pool_dir.py
import os
import multiprocessing
class Foo(object):
def __init__(self, fname):
self.fname = fname #or your calculations
def parallel_get_foos(directory):
p = multiprocessing.Pool()
foos = p.map(Foo, [os.path.join(directory, fname) for fname in os.listdir(directory)])
p.close()
p.join()
return foos
if __name__ == '__main__':
foos = parallel_get_foos('.')
print len(foos) #expected 101: 100 files plus this script
$python test_pool_dir.py
101
版本信息:
$python --version
Python 2.7.3
$uname -a
Linux giacomo-Acer 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
我的猜测是你不正在你所展示的代码示例中显示的内容。例如,在执行此操作时,我会收到类似于您的错误:
>>> import pickle
>>> def test():
... def test2(): pass
... return test2
...
>>> import multiprocessing
>>> p = multiprocessing.Pool()
>>> p.map(test(), [1,2,3])
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
这是显而易见的:
>>> pickle.dumps(test())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/usr/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 748, in save_global
(obj, module, name))
pickle.PicklingError: Can't pickle <function test2 at 0x7fad15fc2938>: it's not found as __main__.test2
pickle
的文件说明:
可以腌制以下类型:
None
,True
和False
- 整数,长整数,浮点数,复数
- 普通和Unicode字符串
tuple
s,list
s,set
s和仅包含可选对象的词典- 函数在模块顶层定义
- 在模块顶层定义的内置函数
- 在模块顶层定义的类
- 此类的实例
__dict__
或调用__getstate__()
的结果是可选的(请参阅pickle协议一节) 详情)。
并继续:
请注意,功能(内置和用户定义)被“完全”腌制 合格的“名称参考,而不是价值。这意味着只有 功能名称被腌制,以及模块的名称 函数定义为。函数的代码,也不是函数的代码 功能属性是pickle。因此,定义模块必须是 可以在unpickling环境中导入,并且模块必须包含 命名对象,否则将引发异常。