芹菜和RabbitMQ最终因内存耗尽而停止

时间:2016-03-05 00:16:04

标签: rabbitmq celery

我有RabbitMQ作为代理的基于Celery的任务队列。我每天处理大约100封邮件。我没有后端设置。

我像这样启动任务主人:

broker = os.environ.get('AMQP_HOST', None)
app = Celery(broker=broker)
server = QueueServer((default_http_host, default_http_port), app)

......我这样开始工作:

broker = os.environ.get('AMQP_HOST', None)
app = Celery('worker', broker=broker)
app.conf.update(
    CELERYD_CONCURRENCY = 1,
    CELERYD_PREFETCH_MULTIPLIER = 1,
    CELERY_ACKS_LATE = True,
)

服务器运行正常一段时间,但大约两周后它突然停止。由于内存耗尽,我已经跟踪到RabbitMQ的停止不再接收消息:

Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: vm_memory_high_watermark set. Memory used:252239992 allowed:249239961
Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: =WARNING REPORT==== 25-Feb-2016::02:01:39 ===
Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: memory resource limit alarm set on node rabbit@e654ac167b10.
Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: **********************************************************
Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: *** Publishers will be blocked until this alarm clears ***
Feb 25 02:01:39 render-mq-1 docker/e654ac167b10[2189]: **********************************************************

问题是我无法弄清楚需要以不同方式配置什么来防止这种耗尽。显然某个地方没有被清除,但我不明白是什么。

例如,大约8天后,rabbitmqctl状态显示我:

{memory,[{total,138588744},
      {connection_readers,1081984},
      {connection_writers,353792},
      {connection_channels,1103992},
      {connection_other,2249320},
      {queue_procs,428528},
      {queue_slave_procs,0},
      {plugins,0},
      {other_proc,13555000},
      {mnesia,74832},
      {mgmt_db,0},
      {msg_index,43243768},
      {other_ets,7874864},
      {binary,42401472},
      {code,16699615},
      {atom,654217},
      {other_system,8867360}]},

......首次启动时它的价格要低得多:

{memory,[{total,51076896},
      {connection_readers,205816},
      {connection_writers,86624},
      {connection_channels,314512},
      {connection_other,371808},
      {queue_procs,318032},
      {queue_slave_procs,0},
      {plugins,0},
      {other_proc,14315600},
      {mnesia,74832},
      {mgmt_db,0},
      {msg_index,2115976},
      {other_ets,1057008},
      {binary,6284328},
      {code,16699615},
      {atom,654217},
      {other_system,8578528}]},

...即使所有队列都是空的(当前正在处理的一个作业除外):

root@dba9f095a160:/# rabbitmqctl list_queues -q name memory messages messages_ready messages_unacknowledged
celery  61152   1   0   1
celery@render-worker-lg3pi.celery.pidbox    117632  0   0   0
celery@render-worker-lkec7.celery.pidbox    70448   0   0   0
celeryev.17c02213-ecb2-4419-8e5a-f5ff682ea4b4   76240   0   0   0
celeryev.5f59e936-44d7-4098-aa72-45555f846f83   27088   0   0   0
celeryev.d63dbc9e-c769-4a75-a533-a06bc4fe08d7   50184   0   0   0

我无法弄清楚如何找到内存消耗的原因。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

日志说您使用252239992字节,大约250Mb,这不是很高。 你在这台机器上有多少内存,rabbitmq的值是vm_memory_high_watermark? (你可以通过运行rabbitmqctl eval "vm_memory_monitor:get_vm_memory_high_watermark()."来检查它) 也许你应该增加水印。

另一个选项可以是制作所有队列lazy https://www.rabbitmq.com/lazy-queues.html

答案 1 :(得分:0)

你似乎没有产生大量的消息,因此2GB内存消耗似乎异常高。尽管如此,您可以尝试让rabbitmq删除旧消息 - 在芹菜配置集中

CELERY_DEFAULT_DELIVERY_MODE = 'transient'