有时将multiprocessing pool与经理和Python 3.4结合使用,lock.acquire()
会引发一个奇怪的TypeError: an integer is required (got type NoneType)
。
我的Travis测试套件出现了几次,我无法弄清楚它来自何处及其含义。更糟糕的是,我无法可靠地再现它,它只是发生或不发生。通常它不会,但每100次运行一次,它会: - (。
我完全迷失了。也许有人之前遇到过这样的事情,并且可以提示在哪里寻找bug的来源。让我从完整的追溯开始:
Traceback (most recent call last):
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/environment.py", line 150, in _single_run
traj._store_final(store_data=store_data)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/trajectory.py", line 3271, in _store_final
store_data=store_data)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 295, in store
self.acquire_lock()
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 284, in acquire_lock
self._lock.acquire()
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 958, in acquire
return self._callmethod('acquire', args)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 731, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 413, in _send_bytes
self._send(chunk)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 369, in _send
n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)
这只发生在python 3.4中,而不是2.7: - /。
我的库产生的bug相当全面。但是,基本上我所做的是以下内容:
import multiprocessing as mp
def my_job(object_with_lock):
# do stuff in parallel
returnvalue = 42 # has been computed in the parallel part
object_with_lock.lock.acquire()
# do stuff sequentially, file IO and so on
object_with_lock.lock.release()
return returnvalue
class MyClassWithLock(object):
def __init__(self, lock):
self.lock = lock
def main():
manager = mp.Manager()
lock = manager.Lock()
my_object_with_lock = MyClassWithLock(lock)
n_cores = 4
pool = mp.Pool(n_cores)
# Do the job concurrently:
iterator = (my_object_with_lock for x in range(100))
imap_results = pool.imap(my_job, iterator)
pool.close()
pool.join()
del pool
result_list = [x for x in imap_results]
manager.shutdown()
print(result_list)
if __name__ == '__main__':
main()
此代码执行正常(虽然没有测试过1000次),但它基本上完成了我在库中的操作。
这样的事情怎么会产生上面的错误?为什么lock.acquire()
会偶尔抛出这个神秘的TypeError
?
编辑:使用Python 3.4.2 DOES复制错误(仅在我的库中),但3.4.1不是o.O
此外,尝试两次似乎可以克服这个问题,但这感觉不对:
try:
object_with_lock.lock.acquire()
except TypeError:
object_with_lock.lock.acquire()
第二次编辑:使用multiprocessing.log_to_stderr()
[感谢do dano]后,我可以恢复以下日志消息。
某处由于以下原因导致连接失败:
[DEBUG/ForkPoolWorker-4] thread 'MainThread' has no more proxies so closing conn
但之前没有发生任何错误,这个突然出现了。
此外,在重试获取锁之前和之后,它说:
[Level 5/ForkPoolWorker-4] finalizer calling <function BaseProxy._decref at 0x7f5890307510> with args (Token(typeid='Lock', address='/tmp/pymp-huxl4h0k/listener-6mm0hc8b', id='7f58903128f0'), b'\x9aF7e\x02\xbc.\xb8\x87\xe0\x00?\xee\xf5\xd6J\x95@\x16\xb7s?\xbf\xe6\xa32a\x16\x13W(\xfb', None, <multiprocessing.util.ForkAwareLocal object at 0x7f58903621c8>, ProcessLocalSet(), <function Client at 0x7f5890375d90>) and kwargs {}
[DEBUG/ForkPoolWorker-4] DECREF '7f58903128f0'
ERROR:pypet.retry:Starting the next try, because I could not execute `acquire_lock` due to: an integer is required (got type NoneType)
[DEBUG/ForkPoolWorker-4] thread 'MainThread' does not own a connection
[DEBUG/ForkPoolWorker-4] making connection to manager
[DEBUG/SyncManager-1] starting server thread to service 'ForkPoolWorker-4'
显然这种联系已经重新建立起来了。我仍然不明白为什么连接首先丢失了。