消费者/生产者“及时”排队

时间:2012-12-05 23:07:11

标签: python priority-queue anti-patterns

我已经实现了一个消费者/生产者优先级队列,其中优先级实际上是一个时间戳,表示应该交付项目的时间。它工作得很好,但我想知道是否有任何人有更好的想法来实现这个或关于当前实现的评论。

代码是Python。创建单个线程以按时唤醒等待的消费者。我知道这是一个在库中创建线程的反模式,但我无法设计另一种方法。

以下是代码:

import collections
import heapq
import threading
import time

class TimelyQueue(threading.Thread):
    """
    Implements a similar but stripped down interface of Queue which
    delivers items on time only.
    """

    class Locker:
        def __init__(self, lock):
            self.l = lock
        def __enter__(self):
            self.l.acquire()
            return self.l
        def __exit__(self, type, value, traceback):
            self.l.release()

    # Optimization to avoid wasting CPU cycles when something
    # is about to happen in less than 5 ms.
    _RESOLUTION = 0.005

    def __init__(self):
        threading.Thread.__init__(self)
        self.daemon = True
        self.queue = []
        self.triggered = collections.deque()
        self.putcond = threading.Condition()
        self.getcond = threading.Condition()
        # Optimization to avoid waking the thread uselessly.
        self.putwaketime = 0

    def put(self, when, item):
        with self.Locker(self.putcond):
            heapq.heappush(self.queue, (when, item))
            if when < self.putwaketime or self.putwaketime == 0:
                self.putcond.notify()

    def get(self, timeout=None):
        with self.Locker(self.getcond):
            if len(self.triggered) > 0:
                when, item = self.triggered.popleft()
                return item
                self.getcond.wait(timeout)
            try:
                when, item = self.triggered.popleft()
            except IndexError:
                return None
            return item

    def qsize(self):
        with self.Locker(self.putcond):
            return len(self.queue)

    def run(self):
        with self.Locker(self.putcond):
            maxwait = None
            while True:
                curtime = time.time()
                try:
                    when, item = self.queue[0]
                    maxwait = when - curtime
                    self.putwaketime = when
                except IndexError:
                    maxwait = None
                    self.putwaketime = 0
                self.putcond.wait(maxwait)

                curtime = time.time()
                while True:
                    # Don't dequeue now, we are not sure to use it yet.
                    try:
                        when, item = self.queue[0]
                    except IndexError:
                        break
                    if when > curtime + self._RESOLUTION:
                        break

                    self.triggered.append(heapq.heappop(self.queue))
                if len(self.triggered) > 0:
                    with self.Locker(self.getcond):
                        self.getcond.notify()


if __name__ == "__main__":
    q = TimelyQueue()
    q.start()

    N = 50000
    t0 = time.time()
    for i in range(N):
        q.put(time.time() + 2, i)
    dt = time.time() - t0
    print "put done in %.3fs (%.2f put/sec)" % (dt, N / dt)
    t0 = time.time()
    i = 0
    while i < N:
        a = q.get(3)
        if i == 0:
            dt = time.time() - t0
            print "start get after %.3fs" % dt
            t0 = time.time()
        i += 1
    dt = time.time() - t0
    print "get done in %.3fs (%.2f get/sec)" % (dt, N / dt)

2 个答案:

答案 0 :(得分:0)

你真正需要背景线程的唯一一件事就是一个计时器,当服务员用完时就把它踢开,对吗?

首先,您可以使用threading.Timer而不是显式后台线程来实现它。但是,虽然这可能更简单,但它无法真正解决您在用户背后创建线程的问题,无论他们是否需要。此外,使用threading.Timer,每次重新启动计时器时,您实际上都会关闭新线程,这可能是性能问题。 (你一次只有一个,但是,启动和停止线程不是免费的。)

如果查看PyPI模块,ActiveState配方和各种框架,有许多实现可以让您在单个后台线程上运行多个计时器。那样可以解决你的问题。

但这仍然不是一个完美的解决方案。例如,假设我的应用需要20个TimelyQueue个对象 - 或者TimelyQueue以及其他19个需要定时器的东西。我最终仍然有20个线程。或者,假设我正在构建套接字服务器或GUI应用程序(TimelyQueue的两个最明显的用例;我可以在事件循环之上实现一个计时器(或者,很可能只是使用一个框架附带的计时器),为什么我需要一个线程?

解决方法是提供一个钩子来供应任何计时器工厂:

def __init__(self, timerfactory = threading.Timer):
    self.timerfactory = timerfactory
    ...

现在,当您需要调整计时器时:

if when < self.waketime:
    self.timer.cancel()
    self.timer = self.timerfactory(when - now(), self.timercallback)
    self.waketime = when

快速&amp;肮脏的用例,开箱即用就足够了。但是,如果我使用twisted,我可以使用TimelyQueue(twisted.reactor.callLater),现在队列的计时器会经历twisted事件循环。或者,如果我有一个多定时器 - 一个线程的实现,我正在其他地方使用TimelyQueue(multiTimer.add),现在队列的定时器与我的所有其他定时器在同一个线程上。

如果您愿意,可以提供比threading.Timer更好的默认值,但实际上,我认为大多数需要比threading.Timer更好的人都可以为他们的特定应用提供更好的内容比你提供的任何东西都好。

当然,并非每个计时器实现都具有与threading.Timer相同的API - 尽管您会惊讶地发现其中有多少计时器具有相同的功能。但是编写一个适配器并不难,如果你有一个你希望与TimelyQueue一起使用的计时器,但它有错误的界面。例如,如果我正在构建PyQt4 / PySide应用程序,QTimer没有cancel方法,并且需要ms而不是秒,所以我必须做这样的事情:< / p>

class AdaptedQTimer(object):
    def __init__(self, timeout, callback):
        self.timer = QTimer.singleShot(timeout * 1000, callback)
    def cancel(self):
        self.timer.stop()

q = TimelyQueue(AdaptedQTimer)

或者,如果我想将队列更直接地集成到QObject,我可以结束QObject.startTimer()并让我的timerEvent(self)方法调用回调。

一旦你考虑适配器,最后一个想法。我不认为这是值得的,但值得考虑。如果您的计时器使用的是时间戳而不是timedelta,并且使用adjust方法而不是/ {而不是cancel,并拥有自己的waketime,那么您的TimelyQueue实施可以更简单,也可能更有效。在put中,您有类似的内容:

if self.timer is None:
    self.timer = self.timerfactory(when)
elif when < self.timer.waketime:
    self.timer.adjust(when)

当然,大多数计时器都不提供此界面。但如果某人有一个,或愿意制作一个,他们就可以获得好处。对于其他人,您可以提供一个简单的适配器,将threading.Timer式计时器转换为您需要的类型,如:

def timerFactoryAdapter(threadingStyleTimerFactory):
    class TimerFactory(object):
        def __init__(self, timestamp, callback):
            self.timer = threadingStyleTimerFactory(timestamp - now(), callback)
            self.callback = callback
        def cancel(self):
            return self.timer.cancel()
        def adjust(self, timestamp):
            self.timer.cancel()
            self.timer = threadingStyleTimerFactory(timestamp - now(), self.callback)

答案 1 :(得分:0)

为了记录,我已经实现了您使用计时器工厂提出的建议。我使用上面的版本和使用threading.Timer类的新版本运行了一个小基准:

  1. 首次实施

    • 使用默认分辨率(5毫秒,也就是5毫秒窗口内的所有内容一起被激活),它达到大约88k put() /秒和69k get() /秒。

    • 分辨率设置为0毫秒(无优化)时,它可达到约88k put() /秒和55k get() /秒。

  2. 第二次实施

    • 使用默认分辨率(5毫秒),它可达到约88k put() /秒和65k get() /秒。

    • 分辨率设置为0毫秒时,它可达到约88k put() /秒和62k get() /秒。

  3. 我承认,如果没有分辨率优化,第二次实施会更快,我感到很惊讶。现在进行调查为时已晚。

    import collections
    import heapq
    import threading
    import time
    
    class TimelyQueue:
        """
        Implements a similar but stripped down interface of Queue which
        delivers items on time only.
        """
    
        def __init__(self, resolution=5, timerfactory=threading.Timer):
            """
            `resolution' is an optimization to avoid wasting CPU cycles when
            something is about to happen in less than X ms.
            """
            self.resolution = float(resolution) / 1000
            self.timerfactory = timerfactory
            self.queue = []
            self.triggered = collections.deque()
            self.putcond = threading.Condition()
            self.getcond = threading.Condition()
            # Optimization to avoid waking the thread uselessly.
            self.putwaketime = 0
            self.timer = None
            self.terminating = False
    
        def __arm(self):
            """
            Arm the next timer; putcond must be acquired!
            """
            curtime = time.time()
            when, item = self.queue[0]
            interval = when - curtime
            self.putwaketime = when
            self.timer = self.timerfactory(interval, self.__fire)
            self.timer.start()
    
        def __fire(self):
            with self.putcond:
                curtime = time.time()
                debug = 0
                while True:
                    # Don't dequeue now, we are not sure to use it yet.
                    try:
                        when, item = self.queue[0]
                    except IndexError:
                        break
                    if when > curtime + self.resolution:
                        break
    
                    debug += 1
                    self.triggered.append(heapq.heappop(self.queue))
                if len(self.triggered) > 0:
                    with self.getcond:
                        self.getcond.notify(len(self.triggered))
                if self.terminating:
                    return
                if len(self.queue) > 0:
                    self.__arm()
    
        def put(self, when, item):
            """
            `when' is a Unix time from Epoch.
            """
            with self.putcond:
                heapq.heappush(self.queue, (when, item))
                if when >= self.putwaketime and self.putwaketime != 0:
                    return
                # Arm next timer.
                if self.timer is not None:
                    self.timer.cancel()
                self.__arm()
    
        def get(self, timeout=None):
            """
            Timely return the next object on the queue.
            """
            with self.getcond:
                if len(self.triggered) > 0:
                    when, item = self.triggered.popleft()
                    return item
                self.getcond.wait(timeout)
                try:
                    when, item = self.triggered.popleft()
                except IndexError:
                    return None
                return item
    
        def qsize(self):
            """
            Self explanatory.
            """
            with self.putcond:
                return len(self.queue)
    
        def terminate(self):
            """
            Request the embedded thread to terminate.
            """
            with self.putcond:
                self.terminating = True
                if self.timer is not None:
                    self.timer.cancel()
                self.putcond.notifyAll()
    
    
    if __name__ == "__main__":
        q = TimelyQueue(0)
        N = 100000
        t0 = time.time()
        for i in range(N):
            q.put(time.time() + 2, i)
        dt = time.time() - t0
        print "put done in %.3fs (%.2f put/sec)" % (dt, N / dt)
        t0 = time.time()
        i = 0
        while i < N:
            a = q.get(3)
            if i == 0:
                dt = time.time() - t0
                print "start get after %.3fs" % dt
                t0 = time.time()
            i += 1
        dt = time.time() - t0
        print "get done in %.3fs (%.2f get/sec)" % (dt, N / dt)
        q.terminate()
        # Give change to the thread to exit properly, otherwise we may get
        # a stray interpreter exception.
        time.sleep(0.1)