python的队列是不完整还是我的设计存在缺陷

时间:2015-06-21 13:16:56

标签: python multithreading

python中的队列对多线程很实用,但它不支持在队列无限期为空时停止工作线程。

例如,考虑一下:

queue = Queue()

def process(payload):
  time.sleep(random())

def work(item):
  while(True):
    payload = queue.get()
    try:
      process(payload)
    except:
      print("TERROR ERROR!")
    finally:
      queue.task_done()

threads = dict()
for thread_id in range(10):
  threads[thread_id] = Thread(target=work)
  threads[thread_id].deamon = True
  threads[thread_id].start()

for payload in range(100):
  queue.put(payload)

queue.join();

所以这很有效,但不是真的。 queue.join()等待报告完成的所有项目,然后主线程完成,但工作线程将无限期地等待。如果这将是(unix)进程的结束,当然,我们可以将它留给操作系统,但如果它继续,则会有这些等待的线程溢出资源。

然后我们实施一个哨兵,EOQ或底部或任何你想要的名字:

class Sentinel:
  def __init__(self):
    pass

def work(item):
  while(True):
    payload = queue.get()
    if type(payload) == Sentinel:
      queue.task_done()
      break
    # ...

threads = dict()
# ...

for thread_id in threads:
  queue.put(Sentinel())
queue.join();

这是一个更好的解决方案,因为线程现在停止了。然而,注入哨兵的代码是笨拙的,并且容易出错。考虑到我不小心在那里放了太多,或者工人线程意外地处理了两个,这样其他工作线程就不会得到它们。

可替换地:

class FiniteQueue(Queue):
  def __init__(self, ....)
    super() .__init__(....)
    self.finished = False

  def put(self, item, ...):
    if self.finished:
      raise AlreadyFinished()
    super().put(item, ...)

  def set_finished(self):
    self.finished=True

  def get(self, ...):
    if self.finished:
      raise AlreadyFinished()
    return super().get(....)

显然,我很懒,并且没有使put()方法成为线程安全的,但是这很有可能。这样,工作人员可以简单地捕获AlreadyFinished对象,并停止。

主队列可以在输入所有有效负载时简单地应用set_finished()。然后,队列可以检测何时不会获得更多有效负载,并在工作人员(或消费者,如果您愿意)的情况下报告此情况。

为什么python队列不提供set_finished()功能?它不会干扰endless_queue用例,但确实支持有限的处理流程。

我在这个设计中错过了一个明显的错误吗?这是不应该要的吗?是否有更简单的替代提供的FiniteQueue?

3 个答案:

答案 0 :(得分:0)

为解决哨兵问题,我输入与哨兵相同的号码,因为有工作线程。如果工作程序检测到某个线程,它将退出,因此不可能重复。作为哨兵,我使用了一个函数的引用 - 它永远不会被调用。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  test_queue.py
#  
#  Copyright 2015 John Coppens <john@jcoppens.com>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#

from Queue import Queue
from threading import Thread
import time, random

NR_WORKERS = 10
queue = Queue()

# The sentinel function below is bot necessary - you can use None as sentinel, 
# though this function might be useful if the worker wants to do something
# after the last job (like wash his hands :)  (Thanks Darren Ringer)

# def sentinel():
#    pass

def process(payload):
    time.sleep(random.random())
    print payload

def work():
    while(True):
        payload = queue.get()
        print "Got from queue: ", payload
        #if payload == sentinel:
        if payload == None:         # See comment above
            queue.task_done()
            break
        process(payload)
        queue.task_done()

threads = dict()
for id in range(NR_WORKERS):
    print "Creating worker ", id
    threads[id] = Thread(target=work)
    #threads[id].deamon = False
    threads[id].start()

for payload in range(100):
    queue.put(payload)

for stopper in range(NR_WORKERS):
    # queue.put(sentinel)             # See comment at def sentinel()
    queue.put(None)

queue.join();
编辑:感谢@DarrenRinger建议使用None。之前我曾经尝试过,但它失败了(因为另一个问题,我怀疑)

答案 1 :(得分:0)

使用 Sentinel 对象的方法是正确和良好的。 worker线程accidentally process two Sentinel 对象是不可能的,因为当它找到其中一个时,它的处理周期会中断。

FiniteQueue方法无效,因为设置finished标志不会唤醒工作人员,在语句super().get(....)处被阻止。

这是支持线程的大多数编程语言的常见问题:一次阻止等待两个条件。在您的情况下,get()方法应该等待:

1)队列变为非空,

2)或者完成标志设置为真

为了正确,wait方法必须知道这两个条件。这使得使用wait支持的现成对象更加困难。某些语言支持某种线程中断,它会唤醒被阻塞的线程。 Python似乎缺乏这样的机制。

答案 2 :(得分:0)

我建议使用所有工作人员共享的Event对象而不是哨兵,以避免意外混合数据和哨兵。另外,请确保在queue.get()上实施阻止,并确保超时不浪费资源。

from __future__ import print_function 
from threading import Thread, Event, current_thread 
from Queue import Queue import time

queue = Queue() 
evt = Event()

def process(payload):   
    time.sleep(1)

def work():   
    tid = current_thread().name
    # try until signaled not to   
    while(not evt.is_set()):
      try:
        # block for max 1 second
        payload = queue.get(True, 1)
        process(payload)
      except Exception as e:
        print("%s thread exception %s" % (tid, e)) 
      else:
        # calling task_done() in finally may cause too many calls
        # resulting in an exception -- only call it once a task has actually been done
        queue.task_done()

threads = dict() 
for thread_id in range(10):   
    threads[thread_id] = Thread(target=work)                 
    threads[thread_id].deamon = True  
    threads[thread_id].start()
for payload in range(10):
   queue.put(payload)
queue.join() 
# all workers will end in approx 1 second at most
evt.set()