SCHED

Question

我有一个过程，需要“稍后”执行一系列操作（通常在10-60秒后）。问题是那些“后来的”动作可能很多（1000s），因此每个任务使用Thread是不可行的。我知道存在gevent和eventlet之类的工具，但其中一个问题是该过程使用zeromq进行通信，因此我需要进行一些集成（eventlet已经有了它）。

我想知道的是我的选择是什么？所以，欢迎提出建议，如图书馆（如果您已经使用过任何提及的请分享您的经验），技术（ Python's "coroutine" support，使用一个休眠一段时间并检查队列的线程，如何使用zeromq的poll或eventloop来完成这项工作，或者别的什么。

Answer 1

考虑使用带有一个或多个工作线程的priority queue来为任务提供服务。主线程可以向队列添加工作，最快的时间戳应该被提供服务。工作线程从队列弹出工作，睡眠直到达到优先级值的时间，完成工作，然后从队列中弹出另一个项目。

如何更充实的答案。 mklauber提出了一个很好的观点。如果你有新的，更紧急的工作，你的所有工人都可能正在睡觉，那么queue.PriorityQueue并不是真正的解决方案，尽管“优先级队列”仍然是使用的技术，可用来自heapq模块。相反，我们将使用不同的同步原语;一个条件变量，在python中拼写为threading.Condition。

方法相当简单，在堆上查看，如果工作是最新的，则将其弹出并执行该操作。如果有工作，但它安排在将来，只要等到那个条件，或者如果根本没有工作，就永远地睡觉。

制片人的作品是公平的;每次添加新工作时，它都会通知情况，所以如果有睡眠工人，他们会醒来并重新检查队列以进行更新的工作。

import heapq, time, threading

START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
    """the actual work function.  nevermind the locks here, this just keeps
       the output nicely formatted.  a real work function probably won't need
       it, or might need quite different synchronization"""
    SERIALIZE_STDOUT.acquire()
    print time.time() - START_TIME, message
    SERIALIZE_STDOUT.release()

def produce(work_queue, condition, timeout, message):
    """called to put a single item onto the work queue."""
    prio = time.time() + float(timeout)
    condition.acquire()
    heapq.heappush(work_queue, (prio, message))
    condition.notify()
    condition.release()

def worker(work_queue, condition):
    condition.acquire()
    stopped = False
    while not stopped:
        now = time.time()
        if work_queue:
            prio, data = work_queue[0]
            if data == 'stop':
                stopped = True
                continue
            if prio < now:
                heapq.heappop(work_queue)
                condition.release()
                # do some work!
                consumer(data)
                condition.acquire()
            else:
                condition.wait(prio - now)
        else:
            # the queue is empty, wait until notified
            condition.wait()
    condition.release()

if __name__ == '__main__':
    # first set up the work queue and worker pool
    work_queue = []
    cond = threading.Condition()
    pool = [threading.Thread(target=worker, args=(work_queue, cond))
            for _ignored in range(4)]
    map(threading.Thread.start, pool)

    # now add some work
    produce(work_queue, cond, 10, 'Grumpy')
    produce(work_queue, cond, 10, 'Sneezy')
    produce(work_queue, cond, 5, 'Happy')
    produce(work_queue, cond, 10, 'Dopey')
    produce(work_queue, cond, 15, 'Bashful')
    time.sleep(5)
    produce(work_queue, cond, 5, 'Sleepy')
    produce(work_queue, cond, 10, 'Doc')

    # and just to make the example a bit more friendly, tell the threads to stop after all
    # the work is done
    produce(work_queue, cond, float('inf'), 'stop')
    map(threading.Thread.join, pool)

Answer 2

这个答案实际上有两个建议 - 我的第一个和另一个我在第一个之后发现的。

SCHED

我怀疑您正在寻找sched module。

编辑：在我阅读之后，我的建议似乎没什么帮助。所以我决定测试sched模块，看看它是否能像我建议的那样工作。这是我的测试：我会用一个单独的线程，或多或少这样使用它：

class SchedulingThread(threading.Thread):

    def __init__(self):
        threading.Thread.__init__(self)
        self.scheduler = sched.scheduler(time.time, time.sleep)
        self.queue = []
        self.queue_lock = threading.Lock()
        self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())

    def run(self):
        self.scheduler.run()

    def schedule(self, function, delay):
        with self.queue_lock:
            self.queue.append((delay, 1, function, ()))

    def _schedule_in_scheduler(self):
        with self.queue_lock:
            for event in self.queue:
                self.scheduler.enter(*event)
                print "Registerd event", event
            self.queue = []
        self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())

首先，我创建一个具有自己的调度程序和队列的线程类。至少会在调度程序中注册一个事件：一个用于调用从队列调度事件的方法。

class SchedulingThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
        self.scheduler = sched.scheduler(time.time, time.sleep)
        self.queue = []
        self.queue_lock = threading.Lock()
        self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())

从队列中调度事件的方法将锁定队列，调度每个事件，清空队列并再次安排自己，以便在将来的某个时间查找新事件。请注意，查找新事件的时间很短（一秒），您可以更改它：

    def _schedule_in_scheduler(self):
        with self.queue_lock:
            for event in self.queue:
                self.scheduler.enter(*event)
                print "Registerd event", event
            self.queue = []
        self.scheduler.enter(1, 1, self._schedule_in_scheduler, ())

该类还应该有一个用于调度用户事件的方法。当然，这种方法应该在更新队列时锁定队列：

    def schedule(self, function, delay):
        with self.queue_lock:
            self.queue.append((delay, 1, function, ()))

最后，该类应该调用scheduler main方法：

    def run(self):
        self.scheduler.run()

以下是使用的示例：

def print_time():
    print "scheduled:", time.time()


if __name__ == "__main__":
    st = SchedulingThread()
    st.start()          
    st.schedule(print_time, 10)

    while True:
        print "main thread:", time.time()
        time.sleep(5)

    st.join()

我机器的输出是：

$ python schedthread.py
main thread: 1311089765.77
Registerd event (10, 1, <function print_time at 0x2f4bb0>, ())
main thread: 1311089770.77
main thread: 1311089775.77
scheduled: 1311089776.77
main thread: 1311089780.77
main thread: 1311089785.77

这段代码只是一个快速的例子，它可能需要一些工作。但是，我必须承认我对sched模块有点着迷，所以我建议它。您可能还想查找其他建议：）

APScheduler

在谷歌寻找像我发布的那样的解决方案，我找到了amazing APScheduler module。它是如此实用和实用，我打赌是你的解决方案。我之前的例子使用这个模块会更简单：

from apscheduler.scheduler import Scheduler
import time

sch = Scheduler()
sch.start()

@sch.interval_schedule(seconds=10)

def print_time():
    print "scheduled:", time.time()
    sch.unschedule_func(print_time)

while True:
    print "main thread:", time.time()
    time.sleep(5)

（不幸的是我没有找到如何安排一个事件只执行一次，所以函数事件应该自行解决。我打赌它可以用一些装饰器来解决。）

Answer 3

如果您有一堆需要稍后执行的任务，并且您希望它们在您关闭调用程序或您的工作人员时仍然存在，那么您应该真正研究Celery，这使它成为超级轻松创建新任务，在任何您喜欢的机器上执行它们，然后等待结果。

在Celery页面中，“这是一个添加两个数字的简单任务：”

from celery.task import task

@task
def add(x, y):
    return x + y

您可以在后台执行任务，或等待它完成：

>>> result = add.delay(8, 8)
>>> result.wait() # wait for and return the result
16

Answer 4

您写道：

问题之一是该进程使用zeromq进行通信，因此我需要进行一些集成（eventlet已经有了）

似乎您的选择将受到这些细节的严重影响，这些细节有点不清楚 - zeromq如何用于通信，集成将需要多少资源，以及您的要求和可用资源。

有一个名为django-ztask的项目使用zeromq并提供类似于芹菜的task装饰器。但是，它（显然）是特定于Django的，因此可能不适合您的情况。我没有用它，我自己更喜欢celery。

在几个项目中使用芹菜（这些项目都在 ep.io PaaS托管中托管，这提供了一种简单的方法来使用它。）

Celery看起来非常灵活的解决方案，允许延迟任务，回调，任务到期和＆amp;重试，限制任务执行率等。它可以与Redis，Beanstalk，CouchDB，MongoDB或SQL数据库一起使用。

示例代码（延迟后任务和异步执行的定义）：

from celery.decorators import task

@task
def my_task(arg1, arg2):
    pass # Do something

result = my_task.apply_async(
    args=[sth1, sth2], # Arguments that will be passed to `my_task()` function.
    countdown=3, # Time in seconds to wait before queueing the task.
)

另见a section in celery docs。

Answer 5

你看过multiprocessing模块了吗？它标配Python。它类似于threading模块，但在一个进程中运行每个任务。您可以使用Pool()对象来设置工作池，然后使用.map()方法调用具有各种排队任务参数的函数。

Answer 6

Pyzmq有一个ioloop实现，其类似于龙卷风ioloop的api。它实现了DelayedCallback，可以帮助您。

Answer 7

假设你的进程有一个可以接收信号的运行循环，每个动作的时间长度都在顺序操作的范围内，使用信号和posix alarm（）

    signal.alarm(time)
If time is non-zero, this function requests that a 
SIGALRM signal be sent to the process in time seconds.

这取决于你的意思“那些”以后的“行动可以很多”，如果你的过程已经使用了信号。由于问题的措辞，不清楚为什么需要外部python包。

Answer 8

另一种选择是使用Phyton GLib bindings，特别是其timeout函数。

只要您不想使用多个内核并且只要对GLib的依赖性没有问题，这是一个不错的选择。它处理同一线程中的所有事件，以防止同步问题。此外，它的事件框架还可用于监视和处理基于IO（即套接字）的事件。

<强>更新

这是使用GLib的实时会话：

>>> import time
>>> import glib
>>> 
>>> def workon(thing):
...     print("%s: working on %s" % (time.time(), thing))
...     return True # use True for repetitive and False for one-time tasks
... 
>>> ml = glib.MainLoop()
>>> 
>>> glib.timeout_add(1000, workon, "this")
2
>>> glib.timeout_add(2000, workon, "that")
3
>>> 
>>> ml.run()
1311343177.61: working on this
1311343178.61: working on that
1311343178.61: working on this
1311343179.61: working on this
1311343180.61: working on this
1311343180.61: working on that
1311343181.61: working on this
1311343182.61: working on this
1311343182.61: working on that
1311343183.61: working on this

Answer 9

在我看来，你可以使用一种叫做“合作多任务处理”的东西。它是基于扭曲的东西，非常酷。请看2010年的PyCon演示文稿：http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-cooperative-multitasking-with-twisted-getting-things-done-concurrently-11-3352182

那么你也需要传输队列才能做到这一点......

Answer 10

简单。您可以从Thread继承您的类并使用Param创建类的实例，例如超时，因此对于您的类的每个实例，您可以说超时将使您的线程等待该时间

如何在Python中“稍后”有效地完成许多任务？

10 个答案:

SCHED

APScheduler