版本:redis-py = 3.1.0和redis = 3.2.10
平台:Python 2.7.5 / CentOS Linux版本7.4.1708(核心)
基础结构:
问题:
工作人员运行一段时间后,突然machine1
上工作的一名工人死于RuntimeError
上的举报,失去了与pubsub的连接。
machine1.worker.log
[2019-02-01 13:43:39,477: CRITICAL/MainProcess] Unrecoverable error: RuntimeError(u'pubsub connection not set: did you forget to call subscribe() or psubscribe()?',)
Traceback (most recent call last):
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/worker.py", line 205, in start
self.blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 322, in start
blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 598, in start
c.loop(*c.loop_args())
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
next(loop)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop
cb(*cbargs)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 1047, in on_readable
self.cycle.on_readable(fileno)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 344, in on_readable
chan.handlers[type]()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 674, in _receive
ret.append(self._receive_one(c))
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 685, in _receive_one
response = c.parse_response()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 3032, in parse_response
'pubsub connection not set: '
RuntimeError: pubsub connection not set: did you forget to call subscribe() or psubscribe()?
与此同时,我发现在machine2
上运行的工作人员由于无法连接到redis
而遭受痛苦。最终,它设法恢复并重新连接到redis
,并接收排队的任务。
machine2.worker.log
[2019-02-01 14:43:41,722: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 322, in start
blueprint.start(self)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 40, in start
self.sync(c)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 44, in sync
replies = self.send_hello(c)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/worker/consumer/mingle.py", line 57, in send_hello
replies = inspect.hello(c.hostname, our_revoked._data) or {}
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 143, in hello
return self._request('hello', from_node=from_node, revoked=revoked)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 95, in _request
timeout=self.timeout, reply=True,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/celery/app/control.py", line 454, in broadcast
limit, callback, channel=channel,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/pidbox.py", line 315, in _broadcast
serializer=serializer)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/pidbox.py", line 290, in _publish
serializer=serializer,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/virtual/base.py", line 605, in basic_publish
message, exchange, routing_key, **kwargs
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/virtual/exchange.py", line 151, in deliver
exchange, message, routing_key, **kwargs)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/kombu/transport/redis.py", line 781, in _put_fanout
dumps(message),
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 2716, in publish
return self.execute_command('PUBLISH', channel, message)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 775, in execute_command
return self.parse_response(connection, command_name, **options)
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/client.py", line 789, in parse_response
response = connection.read_response()
File "/opt/c1/cip-middleware/webapp/virtualenv/lib/python2.7/site-packages/redis/connection.py", line 636, in read_response
raise e
ConnectionError: Error while reading from socket: (u'Connection closed by server.',)
[2019-02-01 14:44:50,237: INFO/MainProcess] Received task: c1_cip_middleware.tasks.validate_purchases.run_purchases_validation[a411dd90-ab50-4101-becb-90adda3663a2]
*问题*
我想知道RuntimeError
上升时的情况/场景是什么,因此,工人进入“不可恢复”阶段并且必须被停止吗?
我怀疑出现此问题的根本原因是什么,特别是一个工人设法康复但另一个工人刚刚死亡?