python多处理挂起,潜在的队列内存错误?

时间:2012-12-30 01:33:54

标签: python queue multiprocessing

我最近发布了一个问题Using multiprocessing for finding network paths,很高兴为@unutbu提供了一个简洁的解决方案

但是在执行test_workers()(利用多处理)功能时遇到了困难。代码会运行,但会在我的网络N

中挂起大量节点G

使用Mac OS X Lion 10.7.5 - python 2.7运行,当N> 500时,它会挂起。日志记录会带来以下消息,然后挂起

[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()

通过VMware融合在Windows 7上运行可以促进更大的网络,但最终会挂起N周围的图形。 20,000个节点(我最好在N = 500,000的网络上使用它)。窗户侧悬挂的消息:

[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()[DEBUG/MainProcess] telling queue thread to quit
Traceback (most recent call last):
      File "C:\Users\Scott\Desktop\fp_test.py", line 75, in <module>
    Traceback (most recent call last):
          File "C:\Python27\lib\multiprocessing\queues.py", line 264, in _feed
    test_workers()
    MemoryError

我想知道是否有人对于为什么会这样做有任何想法?如果有任何关于如何使其适用于大型网络的建议?

非常感谢您提出的任何建议。

@ unutbu的代码:

import networkx as nx
import multiprocessing as mp
import random
import sys
import itertools as IT
import logging
logger = mp.log_to_stderr(logging.DEBUG)


def worker(inqueue, output):
    result = []
    count = 0
    for pair in iter(inqueue.get, sentinel):
        source, target = pair
        for path in nx.all_simple_paths(G, source = source, target = target,
                                        cutoff = None):
            result.append(path)
            count += 1
            if count % 10 == 0:
                logger.info('{c}'.format(c = count))
    output.put(result)

def test_workers():
    result = []
    inqueue = mp.Queue()
    for source, target in IT.product(sources, targets):
        inqueue.put((source, target))
    procs = [mp.Process(target = worker, args = (inqueue, output))
             for i in range(mp.cpu_count())]
    for proc in procs:
        proc.daemon = True
        proc.start()
    for proc in procs:    
        inqueue.put(sentinel)
    for proc in procs:
        result.extend(output.get())
    for proc in procs:
        proc.join()
    return result

def test_single_worker():
    result = []
    count = 0
    for source, target in IT.product(sources, targets):
        for path in nx.all_simple_paths(G, source = source, target = target,
                                        cutoff = None):
            result.append(path)
            count += 1
            if count % 10 == 0:
                logger.info('{c}'.format(c = count))

    return result

sentinel = None

seed = 1
m = 1
N = 1340//m
G = nx.gnm_random_graph(N, int(1.7*N), seed)
random.seed(seed)
sources = [random.randrange(N) for i in range(340//m)]
targets = [random.randrange(N) for i in range(1000//m)]
output = mp.Queue()

if __name__ == '__main__':
    test_workers()
    # test_single_worker()
    # assert set(map(tuple, test_workers())) == set(map(tuple, test_single_worker()))

1 个答案:

答案 0 :(得分:2)

您遇到了logging模块的僵局。

此模块保留一些线程锁以允许跨线程进行安全日志记录,但在当前进程分叉时它无法正常运行。例如,请参阅here,了解正在进行的操作。

解决方案是删除logging来电或改为使用普通print

无论如何,作为一般规则,避免使用线程+分叉。并始终检查哪些模块在幕后使用线程。

请注意,在Windows上,它的工作原理很简单,因为Windows没有fork,因此没有锁定克隆与后续死锁的问题。 在这种情况下,MemoryError表示进程消耗了太多RAM。 您可能不得不重新考虑使用较少RAM的算法,但它与您在OSX上遇到的问题完全不同