我正在运行Celery 2.2.4 / djCelery 2.2.4,使用RabbitMQ 2.1.1作为后端。我最近在网上带来了两台新的芹菜服务器 - 我在两台机器上运行了2名工人,总共有18个线程,在我的新加工盒子上(36g RAM +双超线程四核),我正在运行10每个人有8个线程,总共180个线程 - 我的任务都很小,所以这应该没问题。
过去几天节点运行正常,但今天我注意到.delaay()
正在挂起。当我打断它时,我看到一个指向这里的追溯:
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 324, in delay
return self.apply_async(args, kwargs)
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 449, in apply_async
publish.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/kombu/compat.py", line 108, in close
self.backend.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/channel.py", line 194, in close
(20, 41), # Channel.close_ok
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
self.channel_id, allowed_methods)
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_method
self.method_reader.read_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 212, in read_method
self._next_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 127, in _next_method
frame_type, channel, payload = self.source.read_frame()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 109, in read_frame
frame_type, channel, size = unpack('>BHI', self._read(7))
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 200, in _read
s = self.sock.recv(65536)
我检查了Rabbit日志,我看到了尝试连接的过程:
=INFO REPORT==== 12-Jun-2011::22:58:12 ===
accepted TCP connection on 0.0.0.0:5672 from x.x.x.x:48569
我的Celery日志级别设置为INFO
,但我没有看到Celery日志中有任何特别有趣的内容,除了2名工作人员无法连接到代理:
[2011-06-12 22:41:08,033: ERROR/MainProcess] Consumer: Connection to broker lost. Trying to re-establish connection...
所有其他节点都可以毫无问题地连接。
我知道去年有一个类似性质的帖子(RabbitMQ / Celery with Django hangs on delay/ready/etc - No useful log info),但我很确定这是不同的。可能是因为大量的工作人员在amqplib
中创建某种竞争条件 - 我发现this线程似乎表明amqplib
不是线程安全的,不是确定这对芹菜来说是否重要。
编辑:我在两个节点上都尝试了celeryctl purge
- 一方面成功了,但另一方面却因以下AMQP错误而失败:
AMQPConnectionException(reply_code, reply_text, (class_id, method_id))
amqplib.client_0_8.exceptions.AMQPConnectionException:
(530, u"NOT_ALLOWED - cannot redeclare exchange 'XXXXX' in vhost 'XXXXX'
with different type, durable or autodelete value", (40, 10), 'Channel.exchange_declare')
在两个节点上,inspect stats
挂起上面的“无法关闭连接”回溯。我在这里不知所措。
EDIT2 :我可以使用exchange.delete
中的camqadm
删除违规交易,现在第二个节点也挂起了:(。
EDIT3:最近发生的一件事是我向rabbitmq添加了一个额外的vhost,我的登台节点连接到了它。
答案 0 :(得分:5)
希望这会为某人省下很多时间......虽然它肯定不会给我带来任何尴尬:
运行rabbit的服务器上的 /var
已满。使用我添加的所有节点,Rabbit正在执行更多日志记录并填充/var
- 我无法写入/var/lib/rabbitmq
,因此没有消息通过。