外部API RabbitMQ和Celery速率限制

时间:2016-09-04 01:11:07

标签: python api rabbitmq celery

我正在使用外部 REST API,这限制了我在1 CPS时的API请求。

这是以下架构:

enter image description here

版本:

  • 烧瓶
  • RabbitMQ 3.6.4
  • AMPQ 1.4.9
  • kombu 3.0.35
  • 芹菜3.1.23
  • Python 2.7

API客户端向内部API发送Web请求,API处理请求和控制以何种速率发送到RabbitMQ。这些任务可能需要5秒到120秒,并且在某些情况下,任务可能会排队,并且它们以比定义的更高的速率发送到外部API,从而导致许多失败的请求。 (导致大约5%的失败请求)

可能的解决方案:

  • 增加外部API限制
  • 添加更多工作人员
  • 跟踪失败的任务并稍后重试

虽然这些解决方案可能有效,但并不能完全解决我的速率限制器的实现并控制我的工作人员处理API请求的实际速率。后来我真的需要控制外部费率。

我相信如果我可以控制可以向工作人员发送消息的RabbitMQ速率限制,这可能是更好的选择。我找到了rabbitmq预取选项但不确定是否有人可以推荐其他选项来控制消息发送给消费者的速率?

enter image description here

1 个答案:

答案 0 :(得分:1)

您将需要创建自己的限速器,因为Celery的限速仅适用于每个工人,并且“无法按预期运行”。

我个人发现,尝试从另一个任务添加新任务时,它完全崩溃了。

我认为速率限制的要求范围太广,并且取决于应用程序本身,因此Celery的实现故意太简单了。

这是我使用Celery + Django + Redis创建的示例。 基本上,它为您的App.Task类添加了一种便捷方法,该方法将跟踪Redis中的任务执行率。如果太高,任务将在以后的时间Retry

此示例以发送SMTP消息为例,但是可以很容易地用API调用代替。

该算法的灵感来自Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/

https://gist.github.com/Vigrond/2bbea9be6413415e5479998e79a1b11a

# Rate limiting with Celery + Django + Redis
# Multiple Fixed Windows Algorithm inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
#   and Celery's sometimes ambiguous, vague, and one-paragraph documentation
#
# Celery's Task is subclassed and the is_rate_okay function is added


# celery.py or however your App is implemented in Django
import os
import math
import time

from celery import Celery, Task
from django_redis import get_redis_connection
from django.conf import settings
from django.utils import timezone


app = Celery('your_app')

# Get Redis connection from our Django 'default' cache setting
redis_conn = get_redis_connection("default")

# We subclass the Celery Task
class YourAppTask(Task):
  def is_rate_okay(self, times=30, per=60):
    """
      Checks to see if this task is hitting our defined rate limit too much.
      This example sets a rate limit of 30/minute.

      times (int): The "30" in "30 times per 60 seconds".
      per (int):  The "60" in "30 times per 60 seconds".

      The Redis structure we create is a Hash of timestamp keys with counter values
      {
        '1560649027.515933': '2',  // unlikely to have more than 1
        '1560649352.462433': '1',
      }

      The Redis key is expired after the amount of 'per' has elapsed.
      The algorithm totals the counters and checks against 'limit'.

      This algorithm currently does not implement the "leniency" described 
      at the bottom of the figma article referenced at the top of this code.
      This is left up to you and depends on application.

      Returns True if under the limit, otherwise False.
    """

    # Get a timestamp accurate to the microsecond
    timestamp = timezone.now().timestamp()

    # Set our Redis key to our task name
    key = f"rate:{self.name}"

    # Create a pipeline to execute redis code atomically
    pipe = redis_conn.pipeline()

    # Increment our current task hit in the Redis hash
    pipe.hincrby(key, timestamp)

    # Grab the current expiration of our task key
    pipe.ttl(key)

    # Grab all of our task hits in our current frame (of 60 seconds)
    pipe.hvals(key)

    # This returns a list of our command results.  [current task hits, expiration, list of all task hits,]
    result = pipe.execute()

    # If our expiration is not set, set it.  This is not part of the atomicity of the pipeline above.
    if result[1] < 0:
        redis_conn.expire(key, per)

    # We must convert byte to int before adding up the counters and comparing to our limit
    if sum([int(count) for count in result[2]]) <= times:
        return True
    else:
        return False


app.Task = YourAppTask
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

...

# SMTP Example
import random
from YourApp.celery import app
from django.core.mail import EmailMessage

# We set infinite max_retries so backlogged email tasks do not disappear
@app.task(name='smtp.send-email', max_retries=None, bind=True)
def send_email(self, to_address):

    if not self.is_rate_okay():
        # We implement a random countdown between 30 and 60 seconds 
        #   so tasks don't come flooding back at the same time
        raise self.retry(countdown=random.randint(30, 60))

    message = EmailMessage(
        'Hello',
        'Body goes here',
        'from@yourdomain.com',
        [to_address],
    )
    message.send()