芹菜升级(3.1-> 4.1) - 通过对等方重置连接

时间:2017-08-21 18:42:11

标签: python rabbitmq celery amqp kombu

我们去年与芹菜合作,有大约15名工人,每人都定义了1-4之间的并发。

最近我们将芹菜从v3.1升级到v4.1

现在我们在每个工作日志中都有以下错误,任何想法会导致这样的错误?

2017-08-21 18:33:19,780 94794  ERROR   Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
  File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
    self.node.handle_message(body, message)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
    return self.dispatch(**body)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
    ticket=ticket)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
    serializer=self.mailbox.serializer)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
    **opts
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
    exchange_name, declare,
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
    mandatory=mandatory, immediate=immediate,
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
    (0, exchange, routing_key, mandatory, immediate), msg
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
    write(view[:offset])
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
    self._write(s)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer

BTW:我们的任务形式为:

@app.task(name='EXAMPLE_TASK'],
          bind=True,
          base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
    # task code

1 个答案:

答案 0 :(得分:7)

我们在芹菜问题上也存在很大问题...我花了20%的时间只是在与工人们的奇怪的闲置/崩溃问题上跳舞叹息

我们遇到了类似的情况,这种情况是由高并发性和高worker_prefetch_multiplier引起的,因为事实证明,获取数千个任务是破坏连接的好方法。

如果不是这样:尝试通过将broker_pool_limit设置为None来禁用代理池。

一些可能(希望)有用的快速想法: - )