在我的设置中,我有几个工作进程(每个cpu核心一个),使用put()将结果添加到队列中。然后,主进程使用get()来获取结果并处理日志记录和磁盘i / o。
奇怪的是,在有500k put()调用之后,队列将在5k get()调用之后显示为空。然后(可能在put()调用全部完成时),丢失的数据一次显示,并且可用于get()。可能导致这种情况的原因是什么?
进口:
from multiprocessing import Process, Queue
from Queue import Empty
import time
import traceback
import msgpack
# Custom modules
import logbook
import network
我的worker函数(flow是迭代器):
def run_min_cut(edges_from, nodes, done_q, return_q, error_q):
flows = network.min_cut.dinic_unit_pairwise(edges_from, nodes)
try:
for flow in flows:
return_q.put(flow)
except:
error_q.put(sys.exc_info())
return
done_q.put(1)
将结果写入磁盘的循环:
exp_name = "18_find_min_cut"
edges_file = "archive/17_create_coeditor/2016-11-05 16:42:01 8850183/%d-coeditor.mp"
out_file = "flows.csv"
num_proc = 11
log_period=30
exp = logbook.Experiment(exp_name)
log = exp.get_logger()
log.info("Loading network edges")
all_nodes = set()
edge_count = 0
edges_from = {}
with open(edges_file % 23, "rb") as f:
unpacker = msgpack.Unpacker(f)
for o in unpacker:
edge_count += len(o[1])
edges_from[o[0][0]] = o[1]
all_nodes.add(o[0][0])
all_nodes |= set(o[1])
all_nodes = list(all_nodes)
log.info(" Loaded %d nodes and %d edges" % (len(all_nodes), edge_count))
log.info("Starting %d processes" % num_proc)
step = 1 + len(all_nodes) / num_proc
pair_count = len(all_nodes) * (len(all_nodes) - 1)
return_q = Queue()
done_q = Queue()
error_q = Queue()
workers = []
log.info(" Sending %d nodes to each worker" % step)
for i in range(num_proc):
chunk = all_nodes[(i*step):((i+1)*step)]
args = (edges_from, chunk, done_q, return_q, error_q)
p = Process(target=run_min_cut, args=args)
p.start()
workers.append(p)
log.info("Waiting for results")
try:
with open(exp.get_filename(out_file), "wb") as out:
complete = 0
proc_complete = 0
while proc_complete < num_proc and complete < pair_count:
# Check for errors in worker threads
if (error_q.qsize() > 0):
try:
e = error_q.get(False)
log.error("Caught error from child: %s" % e)
exc_type, exc_obj, exc_tb = e
traceback.print_tb(exc_tb)
[p.terminate() for p in workers]
sys.exit()
except Empty:
pass
# Check for completed threads
if (done_q.qsize() > 0):
try:
done_q.get(False)
proc_complete += 1
except Empty:
pass
try:
while 1:
flow = return_q.get(False)
out.write("%d\n" % flow)
complete += 1
except Empty:
log.info(" Return queue empty")
out.flush()
log.info(
" %d of %d pairs and %d of %d cores complete"
% (complete, pair_count, proc_complete, num_proc))
time.sleep(log_period)
log.info(
" %d of %d pairs and %d of %d cores complete"
% (complete, pair_count, proc_complete, num_proc))
except KeyboardInterrupt:
[p.terminate() for p in workers]
日志摘录显示从队列中读取的项目急剧跳跃。完成了10/11核心后,我知道已经放置了〜900k项目(),但是在对get()进行~400k调用之后队列显示为空,直到缺少的项目全部显示为止。
2016-11-16 19:31:30,343 0 of 1007012 pairs and 0 of 11 cores complete
2016-11-16 19:32:00,379 Return queue empty
2016-11-16 19:32:00,381 907 of 1007012 pairs and 0 of 11 cores complete
2016-11-16 19:32:30,416 Return queue empty
...
2016-11-16 22:22:44,228 430299 of 1007012 pairs and 10 of 11 cores complete
2016-11-16 22:23:18,158 Return queue empty
2016-11-16 22:23:18,160 1007012 of 1007012 pairs and 11 of 11 cores complete