我在Python中创建multiprocessing.Queue
并向multiprocessing.Process
添加Queue
个实例。
我想添加一个在每job
之后执行的函数调用,它会检查特定任务是否成功。如果是这样,我想清空Queue
并终止执行。
我的Process
课程是:
class Worker(multiprocessing.Process):
def __init__(self, queue, check_success=None, directory=None, permit_nonzero=False):
super(Worker, self).__init__()
self.check_success = check_success
self.directory = directory
self.permit_nonzero = permit_nonzero
self.queue = queue
def run(self):
for job in iter(self.queue.get, None):
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
# Terminate all remaining jobs here
pass
我的Queue
设置在这里:
class LocalJobServer(object):
@staticmethod
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False, time=None, *args, **kwargs):
if check_success and not callable(check_success):
msg = "check_success option requires a callable function/object: {0}".format(check_success)
raise ValueError(msg)
# Create a new queue
queue = multiprocessing.Queue()
# Create workers equivalent to the number of jobs
workers = []
for _ in range(nproc):
wp = Worker(queue, check_success=check_success, directory=directory, permit_nonzero=permit_nonzero)
wp.start()
workers.append(wp)
# Add each command to the queue
for cmd in command:
queue.put(cmd, timeout=time)
# Stop workers from exiting without completion
for _ in range(nproc):
queue.put(None)
for wp in workers:
wp.join()
函数调用mbkit.dispatch.cexectools.cexec()
是subprocess.Popen
的包装,并返回p.stdout
。
在Worker
课程中,我编写了条件以检查作业是否成功,并尝试使用Queue
循环清空while
中的剩余作业,即我{ {1}}函数看起来像这样:
Worker.run()
虽然这有时会起作用,但它通常会死锁,我唯一的选择是def run(self):
for job in iter(self.queue.get, None):
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
break
while not self.queue.empty():
self.queue.get()
。我知道Ctrl-C
是不可靠的,因此我的问题。
关于如何实现这种提前终止功能的任何建议?
答案 0 :(得分:1)
这里没有死锁。它仅与multiprocessing.Queue
的行为相关联,因为默认情况下get
方法是阻止的。因此,当您在空队列上调用get
时,调用会停止,等待下一个元素准备就绪。你可以看到你的一些工作人员会失速,因为当你使用你的循环while not self.queue.empty()
清空它时,你删除了所有None
哨兵,你的一些工作人员会阻止空Queue
,就像在这段代码中一样:
from multiprocessing import Queue
q = Queue()
for e in iter(q.get, None):
print(e)
要在队列为空时收到通知,您需要使用非阻塞调用。例如,您可以使用q.get_nowait
,或在q.get(timeout=1)
中使用超时。当队列为空时,两者都抛出multiprocessing.queues.Empty
异常。因此,您应该通过以下内容替换Worker
for job in iter(...):
循环:
while not queue.empty():
try:
job = queue.get(timeout=.1)
except multiprocessing.queues.Empty:
continue
# Do stuff with your job
如果你不想在任何时候陷入困境。
对于同步部分,我建议使用同步原语,例如multiprocessing.Condition
或multiprocessing.Event
。这比他们为此目的设计的价值更清晰。这样的事情应该有所帮助
def run(self):
while not queue.empty():
try:
job = queue.get(timeout=.1)
except multiprocessing.queues.Empty:
continue
if self.event.is_set():
continue
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
self.event.set()
print("Worker {} terminated cleanly".format(self.name))
event = multiprocessing.Event()
。
请注意,也可以使用multiprocessing.Pool
来避免处理队列和工作人员。但是,由于您需要一些同步原语,因此设置可能会有点复杂。这样的事情应该有效:
def worker(job, success, check_success=None, directory=None, permit_nonzero=False):
if sucess.is_set():
return False
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
success.set()
return True
# ......
# In the class LocalJobServer
# .....
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False):
mgr = multiprocessing.Manager()
success = mgr.Event()
pool = multiprocessing.Pool(nproc)
run_args = [(cmd, success, check_success, directory, permit_nonzero)]
result = pool.starmap(worker, run_args)
pool.close()
pool.join()
请注意,我使用的是Manager,因为您无法直接将multiprocessing.Event
作为参数传递。您还可以使用initializer
的{{1}}和initargs
参数在每个工作人员中发起全局Pool
事件,并避免依赖success
但稍微有点Manager
更复杂。
答案 1 :(得分:0)
这可能不是最佳解决方案,并且非常感谢任何其他建议,但我设法解决了这个问题:
class Worker(multiprocessing.Process):
"""Simple manual worker class to execute jobs in the queue"""
def __init__(self, queue, success, check_success=None, directory=None, permit_nonzero=False):
super(Worker, self).__init__()
self.check_success = check_success
self.directory = directory
self.permit_nonzero = permit_nonzero
self.success = success
self.queue = queue
def run(self):
"""Method representing the process's activity"""
for job in iter(self.queue.get, None):
if self.success.value:
continue
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
self.success.value = int(True)
time.sleep(1)
class LocalJobServer(object):
"""A local server to execute jobs via the multiprocessing module"""
@staticmethod
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False, time=None, *args, **kwargs):
if check_success and not callable(check_success):
msg = "check_success option requires a callable function/object: {0}".format(check_success)
raise ValueError(msg)
# Create a new queue
queue = multiprocessing.Queue()
success = multiprocessing.Value('i', int(False))
# Create workers equivalent to the number of jobs
workers = []
for _ in range(nproc):
wp = Worker(queue, success, check_success=check_success, directory=directory, permit_nonzero=permit_nonzero)
wp.start()
workers.append(wp)
# Add each command to the queue
for cmd in command:
queue.put(cmd)
# Stop workers from exiting without completion
for _ in range(nproc):
queue.put(None)
# Start the workers
for wp in workers:
wp.join(time)
基本上我正在创建Value
并将其提供给每个Process
。将作业标记为成功后,此变量将更新。 if self.success.value: continue
中的每个Process
都会检查我们是否成功,如果成功,只需迭代Queue
中的剩余作业,直到空。
需要time.sleep(1)
电话来说明流程之间潜在的同步延迟。这当然不是最有效的方法,但它有效。