celery .delay挂起(最近,不是auth问题)

时间:2011-06-13 02:46:44

标签: python django rabbitmq celery

我正在运行Celery 2.2.4 / djCelery 2.2.4,使用RabbitMQ 2.1.1作为后端。我最近在网上带来了两台新的芹菜服务器 - 我在两台机器上运行了2名工人,总共有18个线程,在我的新加工盒子上(36g RAM +双超线程四核),我正在运行10每个人有8个线程,总共180个线程 - 我的任务都很小,所以这应该没问题。

过去几天节点运行正常,但今天我注意到.delaay()正在挂起。当我打断它时,我看到一个指向这里的追溯:

File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 324, in delay
    return self.apply_async(args, kwargs)
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 449, in apply_async
    publish.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/kombu/compat.py", line 108, in close
    self.backend.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/channel.py", line 194, in close
    (20, 41),    # Channel.close_ok
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
    self.channel_id, allowed_methods)
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_method
    self.method_reader.read_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 212, in read_method
    self._next_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 127, in _next_method
    frame_type, channel, payload = self.source.read_frame()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 109, in read_frame
    frame_type, channel, size = unpack('>BHI', self._read(7))
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 200, in _read
    s = self.sock.recv(65536)

我检查了Rabbit日志,我看到了尝试连接的过程:

=INFO REPORT==== 12-Jun-2011::22:58:12 ===
accepted TCP connection on 0.0.0.0:5672 from x.x.x.x:48569

我的Celery日志级别设置为INFO,但我没有看到Celery日志中有任何特别有趣的内容,除了2名工作人员无法连接到代理:

[2011-06-12 22:41:08,033: ERROR/MainProcess] Consumer: Connection to broker lost. Trying to re-establish connection...

所有其他节点都可以毫无问题地连接。

我知道去年有一个类似性质的帖子(RabbitMQ / Celery with Django hangs on delay/ready/etc - No useful log info),但我很确定这是不同的。可能是因为大量的工作人员在amqplib中创建某种竞争条件 - 我发现this线程似乎表明amqplib不是线程安全的,不是确定这对芹菜来说是否重要。

编辑:我在两个节点上都尝试了celeryctl purge - 一方面成功了,但另一方面却因以下AMQP错误而失败:

AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
    amqplib.client_0_8.exceptions.AMQPConnectionException: 
    (530, u"NOT_ALLOWED - cannot redeclare exchange 'XXXXX' in vhost 'XXXXX' 
     with different type, durable or autodelete   value", (40, 10), 'Channel.exchange_declare')

在两个节点上,inspect stats挂起上面的“无法关闭连接”回溯。我在这里不知所措。

EDIT2 :我可以使用exchange.delete中的camqadm删除违规交易,现在第二个节点也挂起了:(。

EDIT3:最近发生的一件事是我向rabbitmq添加了一个额外的vhost,我的登台节点连接到了它。

1 个答案:

答案 0 :(得分:5)

希望这会为某人省下很多时间......虽然它肯定不会给我带来任何尴尬:

运行rabbit的服务器上的

/var已满。使用我添加的所有节点,Rabbit正在执行更多日志记录并填充/var - 我无法写入/var/lib/rabbitmq,因此没有消息通过。