避免芹菜经纪人的重复任务

时间:2014-11-09 17:37:32

标签: celery

我想使用celery configuration \ api:

创建以下流程
  • 发送TaskA(argB)仅当芹菜队列没有任何TaskA(argB)已待定

有可能吗?如何?

4 个答案:

答案 0 :(得分:2)

我想不出办法,但

  1. 通过celery inspect

  2. 检索所有正在执行和已安排的任务
  3. 通过它们来看看你的任务是否存在。

  4. 检查this问题,看看第一点是如何完成的。

    祝你好运

答案 1 :(得分:1)

您可以通过某种记忆方式让您的工作了解其他任务。如果您使用缓存控制键(redis,memcached,/ tmp,无论什么都很方便),您可以使执行依赖于该键。我使用redis作为例子。

from redis import Redis

@app.task
def run_only_one_instance(params):
    try:
        sentinel =  Redis().incr("run_only_one_instance_sentinel")
        if sentinel == 1:
            #I am the legitimate running task
            perform_task()
        else:
            #Do you want to do something else on task duplicate?
            pass
        Redis().decr("run_only_one_instance_sentinel")
    except Exception as e:
        Redis().decr("run_only_one_instance_sentinel")
        # potentially log error with Sentry?
        # decrement the counter to insure tasks can run
        # or: raise e

答案 2 :(得分:1)

我不知道它会比其他答案更能帮助你,但我的方法是按照srj给出的相同想法。我需要一种方法来阻止我的服务器启动具有相同ID的队列任务。所以我做了一个通用的功能来帮助我。

def is_task_active_or_registered(app, task_id):

    i = app.control.inspect()

    active_dict = i.active()
    scheduled_dict = i.scheduled()
    keys_set = set(active_dict.keys() + scheduled_dict.keys())
    tasks_ids_set = set()

    for _dict in [active_dict, scheduled_dict]:
        for k in keys_set:
            for task in _dict[k]:
                tasks_ids_set.add(task['id'])

    if task_id in tasks_ids_set:
        return True
    else:
        return False

所以,我这样使用它:

在我的celery-app对象可用的上下文中,我定义:

def check_task_can_not_run(task_id):
    return is_task_active_or_registered(app=celery, task_id=task_id)

因此,根据我的客户请求,我调用此check_task_can_not_run(...)并阻止在True的情况下启动任务。

答案 3 :(得分:0)

我正面临类似的问题。节拍在我的队列中重复。我想使用expires,但是此功能无法正常使用https://github.com/celery/celery/issues/4300

因此,这里是调度程序,用于检查任务是否已经排队(基于任务名称)。

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

import json
from heapq import heappop, heappush

from celery.beat import event_t
from celery.schedules import schedstate
from django_celery_beat.schedulers import DatabaseScheduler
from typing import List, Optional
from typing import TYPE_CHECKING

from your_project import celery_app

if TYPE_CHECKING:
    from celery.beat import ScheduleEntry


def is_task_in_queue(task, queue_name=None):
    # type: (str, Optional[str]) -> bool
    queues = [queue_name] if queue_name else celery_app.amqp.queues.keys()

    for queue in queues:
        if task in get_celery_queue_tasks(queue):
            return True
    return False


def get_celery_queue_tasks(queue_name):
    # type: (str) -> List[str]
    with celery_app.pool.acquire(block=True) as conn:
        tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
        decoded_tasks = []

    for task in tasks:
        j = json.loads(task)
        task = j['headers']['task']
        if task not in decoded_tasks:
            decoded_tasks.append(task)

    return decoded_tasks


class SmartScheduler(DatabaseScheduler):
    """
    Smart means that prevents duplicating of tasks in queues.
    """
    def is_due(self, entry):
        # type: (ScheduleEntry) -> schedstate
        is_due, next_time_to_run = entry.is_due()

        if (
            not is_due or  # duplicate wouldn't be created
            not is_task_in_queue(entry.task)  # not in queue so let it run
        ):
            return schedstate(is_due, next_time_to_run)

        # Task should be run (is_due) and it is present in queue (is_task_in_queue)

        H = self._heap
        if not H:
            return schedstate(False, self.max_interval)

        event = H[0]
        verify = heappop(H)
        if verify is event:
            next_entry = self.reserve(entry)
            heappush(H, event_t(self._when(next_entry, next_time_to_run), event[1], next_entry))
        else:
            heappush(H, verify)
            next_time_to_run = min(verify[0], next_time_to_run)

        return schedstate(False, min(next_time_to_run, self.max_interval))