Python 3.4 Lock.acquire()上的多处理错误,TypeError:需要整数

时间:2015-03-26 11:21:28

标签: python locking multiprocessing python-3.4 python-multiprocessing

有时将multiprocessing pool与经理和Python 3.4结合使用,lock.acquire()会引发一个奇怪的TypeError: an integer is required (got type NoneType)

我的Travis测试套件出现了几次,我无法弄清楚它来自何处及其含义。更糟糕的是,我无法可靠地再现它,它只是发生或不发生。通常它不会,但每100次运行一次,它会: - (。

我完全迷失了。也许有人之前遇到过这样的事情,并且可以提示在哪里寻找bug的来源。让我从完整的追溯开始:

Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/environment.py", line 150, in _single_run
    traj._store_final(store_data=store_data)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/trajectory.py", line 3271, in _store_final
    store_data=store_data)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 295, in store
    self.acquire_lock()
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 284, in acquire_lock
    self._lock.acquire()
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 958, in acquire
    return self._callmethod('acquire', args)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 731, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 413, in _send_bytes
    self._send(chunk)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 369, in _send
    n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)

这只发生在python 3.4中,而不是2.7: - /。

我的库产生的bug相当全面。但是,基本上我所做的是以下内容:

import multiprocessing as mp


def my_job(object_with_lock):
    # do stuff in parallel
    returnvalue = 42  # has been computed in the parallel part

    object_with_lock.lock.acquire()
    # do stuff sequentially, file IO and so on
    object_with_lock.lock.release()
    return returnvalue


class MyClassWithLock(object):
    def __init__(self, lock):
        self.lock = lock


def main():
    manager = mp.Manager()
    lock = manager.Lock()
    my_object_with_lock = MyClassWithLock(lock)

    n_cores = 4
    pool = mp.Pool(n_cores)

    # Do the job concurrently:
    iterator = (my_object_with_lock for x in range(100))
    imap_results = pool.imap(my_job, iterator)

    pool.close()
    pool.join()
    del pool

    result_list = [x for x in imap_results]

    manager.shutdown()

    print(result_list)


if __name__ == '__main__':
    main()

此代码执行正常(虽然没有测试过1000次),但它基本上完成了我在库中的操作。

这样的事情怎么会产生上面的错误?为什么lock.acquire()会偶尔抛出这个神秘的TypeError


编辑:使用Python 3.4.2 DOES复制错误(仅在我的库中),但3.4.1不是o.O

此外,尝试两次似乎可以克服这个问题,但这感觉不对:

try:
    object_with_lock.lock.acquire()
except TypeError:
    object_with_lock.lock.acquire()

第二次编辑:使用multiprocessing.log_to_stderr() [感谢do dano]后,我可以恢复以下日志消息。 某处由于以下原因导致连接失败:

[DEBUG/ForkPoolWorker-4] thread 'MainThread' has no more proxies so closing conn

但之前没有发生任何错误,这个突然出现了。

此外,在重试获取锁之前和之后,它说:

[Level 5/ForkPoolWorker-4] finalizer calling <function BaseProxy._decref at 0x7f5890307510> with args (Token(typeid='Lock', address='/tmp/pymp-huxl4h0k/listener-6mm0hc8b', id='7f58903128f0'), b'\x9aF7e\x02\xbc.\xb8\x87\xe0\x00?\xee\xf5\xd6J\x95@\x16\xb7s?\xbf\xe6\xa32a\x16\x13W(\xfb', None, <multiprocessing.util.ForkAwareLocal object at 0x7f58903621c8>, ProcessLocalSet(), <function Client at 0x7f5890375d90>) and kwargs {}
[DEBUG/ForkPoolWorker-4] DECREF '7f58903128f0'
ERROR:pypet.retry:Starting the next try, because I could not execute `acquire_lock` due to: an integer is required (got type NoneType)
[DEBUG/ForkPoolWorker-4] thread 'MainThread' does not own a connection
[DEBUG/ForkPoolWorker-4] making connection to manager
[DEBUG/SyncManager-1] starting server thread to service 'ForkPoolWorker-4'

显然这种联系已经重新建立起来了。我仍然不明白为什么连接首先丢失了。

0 个答案:

没有答案