数据put()到python多处理队列直到几小时后才能用于get()?

时间:2016-11-17 04:21:31

标签: python parallel-processing queue multiprocessing

在我的设置中,我有几个工作进程(每个cpu核心一个),使用put()将结果添加到队列中。然后,主进程使用get()来获取结果并处理日志记录和磁盘i / o。

奇怪的是,在有500k put()调用之后,队列将在5k get()调用之后显示为空。然后(可能在put()调用全部完成时),丢失的数据一次显示,并且可用于get()。可能导致这种情况的原因是什么?

进口:

from multiprocessing import Process, Queue
from Queue import Empty
import time
import traceback
import msgpack

# Custom modules
import logbook
import network

我的worker函数(flow是迭代器):

def run_min_cut(edges_from, nodes, done_q, return_q, error_q):
    flows = network.min_cut.dinic_unit_pairwise(edges_from, nodes)
    try:
        for flow in flows:
            return_q.put(flow)
    except:
        error_q.put(sys.exc_info())
        return
    done_q.put(1)

将结果写入磁盘的循环:

exp_name = "18_find_min_cut"
edges_file = "archive/17_create_coeditor/2016-11-05 16:42:01 8850183/%d-coeditor.mp"
out_file = "flows.csv"
num_proc = 11
log_period=30

exp = logbook.Experiment(exp_name)
log = exp.get_logger()

log.info("Loading network edges")
all_nodes = set()
edge_count = 0
edges_from = {}
with open(edges_file % 23, "rb") as f:
    unpacker = msgpack.Unpacker(f)
    for o in unpacker:
        edge_count += len(o[1])
        edges_from[o[0][0]] = o[1]
        all_nodes.add(o[0][0])
        all_nodes |= set(o[1])
all_nodes = list(all_nodes)
log.info("  Loaded %d nodes and %d edges" % (len(all_nodes), edge_count))

log.info("Starting %d processes" % num_proc)
step = 1 + len(all_nodes) / num_proc
pair_count = len(all_nodes) * (len(all_nodes) - 1)
return_q = Queue()
done_q = Queue()
error_q = Queue()
workers = []
log.info("  Sending %d nodes to each worker" % step)
for i in range(num_proc):
    chunk = all_nodes[(i*step):((i+1)*step)]
    args = (edges_from, chunk, done_q, return_q, error_q)
    p = Process(target=run_min_cut, args=args)
    p.start()
    workers.append(p)

log.info("Waiting for results")
try:
    with open(exp.get_filename(out_file), "wb") as out:
        complete = 0
        proc_complete = 0
        while proc_complete < num_proc and complete < pair_count:
            # Check for errors in worker threads
            if (error_q.qsize() > 0):
                try:
                    e = error_q.get(False)
                    log.error("Caught error from child: %s" % e)
                    exc_type, exc_obj, exc_tb = e
                    traceback.print_tb(exc_tb)
                    [p.terminate() for p in workers]
                    sys.exit()
                except Empty:
                    pass
            # Check for completed threads
            if (done_q.qsize() > 0):
                try:
                    done_q.get(False)
                    proc_complete += 1
                except Empty:
                    pass
            try:
                while 1:
                    flow = return_q.get(False)
                    out.write("%d\n" % flow)
                    complete += 1
            except Empty:
                log.info("  Return queue empty")
            out.flush()
            log.info(
                "  %d of %d pairs and %d of %d cores complete"
                % (complete, pair_count, proc_complete, num_proc))
            time.sleep(log_period)
        log.info(
            "  %d of %d pairs and %d of %d cores complete"
            % (complete, pair_count, proc_complete, num_proc))
except KeyboardInterrupt:
    [p.terminate() for p in workers]

日志摘录显示从队列中读取的项目急剧跳跃。完成了10/11核心后,我知道已经放置了〜900k项目(),但是在对get()进行~400k调用之后队列显示为空,直到缺少的项目全部显示为止。

2016-11-16 19:31:30,343   0 of 1007012 pairs and 0 of 11 cores complete
2016-11-16 19:32:00,379   Return queue empty
2016-11-16 19:32:00,381   907 of 1007012 pairs and 0 of 11 cores complete
2016-11-16 19:32:30,416   Return queue empty
...
2016-11-16 22:22:44,228   430299 of 1007012 pairs and 10 of 11 cores complete
2016-11-16 22:23:18,158   Return queue empty
2016-11-16 22:23:18,160   1007012 of 1007012 pairs and 11 of 11 cores complete

0 个答案:

没有答案