python中的队列对多线程很实用,但它不支持在队列无限期为空时停止工作线程。
例如,考虑一下:
queue = Queue()
def process(payload):
time.sleep(random())
def work(item):
while(True):
payload = queue.get()
try:
process(payload)
except:
print("TERROR ERROR!")
finally:
queue.task_done()
threads = dict()
for thread_id in range(10):
threads[thread_id] = Thread(target=work)
threads[thread_id].deamon = True
threads[thread_id].start()
for payload in range(100):
queue.put(payload)
queue.join();
所以这很有效,但不是真的。 queue.join()等待报告完成的所有项目,然后主线程完成,但工作线程将无限期地等待。如果这将是(unix)进程的结束,当然,我们可以将它留给操作系统,但如果它继续,则会有这些等待的线程溢出资源。
然后我们实施一个哨兵,EOQ或底部或任何你想要的名字:
class Sentinel:
def __init__(self):
pass
def work(item):
while(True):
payload = queue.get()
if type(payload) == Sentinel:
queue.task_done()
break
# ...
threads = dict()
# ...
for thread_id in threads:
queue.put(Sentinel())
queue.join();
这是一个更好的解决方案,因为线程现在停止了。然而,注入哨兵的代码是笨拙的,并且容易出错。考虑到我不小心在那里放了太多,或者工人线程意外地处理了两个,这样其他工作线程就不会得到它们。
可替换地:
class FiniteQueue(Queue):
def __init__(self, ....)
super() .__init__(....)
self.finished = False
def put(self, item, ...):
if self.finished:
raise AlreadyFinished()
super().put(item, ...)
def set_finished(self):
self.finished=True
def get(self, ...):
if self.finished:
raise AlreadyFinished()
return super().get(....)
显然,我很懒,并且没有使put()方法成为线程安全的,但是这很有可能。这样,工作人员可以简单地捕获AlreadyFinished对象,并停止。
主队列可以在输入所有有效负载时简单地应用set_finished()。然后,队列可以检测何时不会获得更多有效负载,并在工作人员(或消费者,如果您愿意)的情况下报告此情况。
为什么python队列不提供set_finished()功能?它不会干扰endless_queue用例,但确实支持有限的处理流程。
我在这个设计中错过了一个明显的错误吗?这是不应该要的吗?是否有更简单的替代提供的FiniteQueue?
答案 0 :(得分:0)
为解决哨兵问题,我输入与哨兵相同的号码,因为有工作线程。如果工作程序检测到某个线程,它将退出,因此不可能重复。作为哨兵,我使用了一个函数的引用 - 它永远不会被调用。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# test_queue.py
#
# Copyright 2015 John Coppens <john@jcoppens.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
from Queue import Queue
from threading import Thread
import time, random
NR_WORKERS = 10
queue = Queue()
# The sentinel function below is bot necessary - you can use None as sentinel,
# though this function might be useful if the worker wants to do something
# after the last job (like wash his hands :) (Thanks Darren Ringer)
# def sentinel():
# pass
def process(payload):
time.sleep(random.random())
print payload
def work():
while(True):
payload = queue.get()
print "Got from queue: ", payload
#if payload == sentinel:
if payload == None: # See comment above
queue.task_done()
break
process(payload)
queue.task_done()
threads = dict()
for id in range(NR_WORKERS):
print "Creating worker ", id
threads[id] = Thread(target=work)
#threads[id].deamon = False
threads[id].start()
for payload in range(100):
queue.put(payload)
for stopper in range(NR_WORKERS):
# queue.put(sentinel) # See comment at def sentinel()
queue.put(None)
queue.join();
编辑:感谢@DarrenRinger建议使用None。之前我曾经尝试过,但它失败了(因为另一个问题,我怀疑)
答案 1 :(得分:0)
使用 Sentinel 对象的方法是正确和良好的。 worker
线程accidentally process two
Sentinel 对象是不可能的,因为当它找到其中一个时,它的处理周期会中断。
FiniteQueue
方法无效,因为设置finished
标志不会唤醒工作人员,在语句super().get(....)
处被阻止。
这是支持线程的大多数编程语言的常见问题:一次阻止等待两个条件。在您的情况下,get()
方法应该等待:
1)队列变为非空,
2)或者完成标志设置为真
为了正确,wait方法必须知道这两个条件。这使得使用wait
支持的现成对象更加困难。某些语言支持某种线程中断,它会唤醒被阻塞的线程。 Python似乎缺乏这样的机制。
答案 2 :(得分:0)
我建议使用所有工作人员共享的Event
对象而不是哨兵,以避免意外混合数据和哨兵。另外,请确保在queue.get()
上实施阻止,并确保超时不浪费资源。
from __future__ import print_function
from threading import Thread, Event, current_thread
from Queue import Queue import time
queue = Queue()
evt = Event()
def process(payload):
time.sleep(1)
def work():
tid = current_thread().name
# try until signaled not to
while(not evt.is_set()):
try:
# block for max 1 second
payload = queue.get(True, 1)
process(payload)
except Exception as e:
print("%s thread exception %s" % (tid, e))
else:
# calling task_done() in finally may cause too many calls
# resulting in an exception -- only call it once a task has actually been done
queue.task_done()
threads = dict()
for thread_id in range(10):
threads[thread_id] = Thread(target=work)
threads[thread_id].deamon = True
threads[thread_id].start()
for payload in range(10):
queue.put(payload)
queue.join()
# all workers will end in approx 1 second at most
evt.set()