芹菜:具有相同参数的任务的速率限制

时间:2015-04-24 17:42:22

标签: python celery rate-limiting ratelimit

我正在寻找一种方法来限制何时调用函数,但仅当输入参数不同时,即:

api_call("antoine")

所以我希望api_call("oscar")被称为每秒60次@app.task(rate_limit="60/s") def sub_api_call(user): do_the_api_call() @app.task def api_call(user): sub_api_call(user) for i in range(0,100): api_call("antoine") api_call("oscar") 每秒60次

有关如何做到这一点的任何帮助?

- 编辑27/04/2015 我曾尝试在任务中使用rate_limit调用子任务,但它也不起作用:rate_limit总是应用于所有实例化的子任务或任务(这是逻辑的)。

var value;
var data = ["A", "B", "C", "D"];

// add a separate call to the chain for each piece of data
for (var i = 0; i < data.length; i++) {
  value = data[i];
  $.when($.Deferred().resolve("test"), $.Deferred().resolve(value))
  .then(function(response, x) {
      console.log(response + ':' + x);
  });
}

最佳!

2 个答案:

答案 0 :(得分:1)

我今天花了一些时间来解决这个问题,并想出了一个不错的解决方案。所有其他解决方案都存在以下问题之一:

  • 它们要求任务无限重试,从而使 celery 的重试机制无用。
  • 他们不会根据参数进行节流
  • 由于多个工作线程或队列而失败
  • 它们很笨重,等等。

基本上,你这样包装你的任务:

@app.task(bind=True, max_retries=10)
@throttle_task("2/s", key="domain", jitter=(2, 15))
def scrape_domain(self, domain):
    do_stuff()

结果是您将任务限制为每个域参数每秒运行 2 次,随机重试抖动在 2-15 秒之间。 key 参数是可选的,但对应于任务中的一个参数。如果没有给出关键参数,它只会将任务限制到给定的速率。如果提供,则节流将应用于 (task, key) dyad。

另一种看待这个的方式是没有装饰器。这提供了更多的灵活性,但需要您自己进行重试。您可以执行以下操作:

@app.task(bind=True, max_retries=10)
def scrape_domain(self, domain):
    proceed = is_rate_okay(self, "2/s", key=domain)
    if proceed:
        do_stuff()
    else:
        self.request.retries = task.request.retries - 1  # Don't count this as against max_retries.
        return task.retry(countdown=random.uniform(2, 15))

我认为这与第一个示例相同。更长一点,更多分支,但更清楚地展示了它的工作原理。我希望自己总是使用装饰器。

这一切都通过在 redis 中保持计数来实现。实现非常简单。您在 redis 中为任务创建一个密钥(以及密钥参数,如果给定),并根据提供的时间表使 redis 密钥过期。如果用户将速率设置为 10/m,则您创建一个 60 秒的 redis 键,并且每次尝试使用正确名称的任务时都会增加它。如果您的增量器变得太高,请重试该任务。否则,运行它。

def parse_rate(rate: str) -> Tuple[int, int]:
    """

    Given the request rate string, return a two tuple of:
    <allowed number of requests>, <period of time in seconds>

    (Stolen from Django Rest Framework.)
    """
    num, period = rate.split("/")
    num_requests = int(num)
    if len(period) > 1:
        # It takes the form of a 5d, or 10s, or whatever
        duration_multiplier = int(period[0:-1])
        duration_unit = period[-1]
    else:
        duration_multiplier = 1
        duration_unit = period[-1]
    duration_base = {"s": 1, "m": 60, "h": 3600, "d": 86400}[duration_unit]
    duration = duration_base * duration_multiplier
    return num_requests, duration


def throttle_task(
    rate: str,
    jitter: Tuple[float, float] = (1, 10),
    key: Any = None,
) -> Callable:
    """A decorator for throttling tasks to a given rate.

    :param rate: The maximum rate that you want your task to run. Takes the
    form of '1/m', or '10/2h' or similar.
    :param jitter: A tuple of the range of backoff times you want for throttled
    tasks. If the task is throttled, it will wait a random amount of time
    between these values before being tried again.
    :param key: An argument name whose value should be used as part of the
    throttle key in redis. This allows you to create per-argument throttles by
    simply passing the name of the argument you wish to key on.
    :return: The decorated function
    """

    def decorator_func(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            # Inspect the decorated function's parameters to get the task
            # itself and the value of the parameter referenced by key.
            sig = inspect.signature(func)
            bound_args = sig.bind(*args, **kwargs)
            task = bound_args.arguments["self"]
            key_value = None
            if key:
                try:
                    key_value = bound_args.arguments[key]
                except KeyError:
                    raise KeyError(
                        f"Unknown parameter '{key}' in throttle_task "
                        f"decorator of function {task.name}. "
                        f"`key` parameter must match a parameter "
                        f"name from function signature: '{sig}'"
                    )
            proceed = is_rate_okay(task, rate, key=key_value)
            if not proceed:
                logger.info(
                    "Throttling task %s (%s) via decorator.",
                    task.name,
                    task.request.id,
                )
                # Decrement the number of times the task has retried. If you
                # fail to do this, it gets auto-incremented, and you'll expend
                # retries during the backoff.
                task.request.retries = task.request.retries - 1
                return task.retry(countdown=random.uniform(*jitter))
            else:
                # All set. Run the task.
                return func(*args, **kwargs)

        return wrapper

    return decorator_func


def is_rate_okay(task: Task, rate: str = "1/s", key=None) -> bool:
    """Keep a global throttle for tasks

    Can be used via the `throttle_task` decorator above.

    This implements the timestamp-based algorithm detailed here:

        https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/

    Basically, you keep track of the number of requests and use the key
    expiration as a reset of the counter.

    So you have a rate of 5/m, and your first task comes in. You create a key:

        celery_throttle:task_name = 1
        celery_throttle:task_name.expires = 60

    Another task comes in a few seconds later:

        celery_throttle:task_name = 2
        Do not update the ttl, it now has 58s remaining

    And so forth, until:

        celery_throttle:task_name = 6
        (10s remaining)

    We're over the threshold. Re-queue the task for later. 10s later:

        Key expires b/c no more ttl.

    Another task comes in:

        celery_throttle:task_name = 1
        celery_throttle:task_name.expires = 60

    And so forth.

    :param task: The task that is being checked
    :param rate: How many times the task can be run during the time period.
    Something like, 1/s, 2/h or similar.
    :param key: If given, add this to the key placed in Redis for the item.
    Typically, this will correspond to the value of an argument passed to the
    throttled task.
    :return: Whether the task should be throttled or not.
    """
    key = f"celery_throttle:{task.name}{':' + str(key) if key else ''}"

    r = make_redis_interface("CACHE")

    num_tasks, duration = parse_rate(rate)

    # Check the count in redis
    count = r.get(key)
    if count is None:
        # No key. Set the value to 1 and set the ttl of the key.
        r.set(key, 1)
        r.expire(key, duration)
        return True
    else:
        # Key found. Check it.
        if int(count) <= num_tasks:
            # We're OK, run it.
            r.incr(key, 1)
            return True
        else:
            return False

答案 1 :(得分:0)

我认为这不可能通过Celery的内置任务限制器来实现。

假设您正在为API使用某种缓存,最好的解决方案可能是创建任务名称和args的哈希值,并将该键用于基于缓存的限制器。

如果您正在使用Redis,则可以设置60秒超时锁定,或使用增量计数器计算每分钟的呼叫数。

这篇文章可能会给你一些关于使用Redis分配限制Celery任务的指示:

https://callhub.io/blog/2014/02/03/distributed-rate-limiting-with-redis-and-celery/