我的所有芹菜工人都会定期被困在某物上。我无法弄清楚导致这种情况的原因,因为Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for roles starting with 'pg_' ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Copying user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Upgrade Complete
不起作用,因为所有工人都很忙。
inspect
是否有可能像活动任务一样获得Celery状态,即使节点正在做某事(这似乎导致了问题)?我可以以某种方式启动一个临时工来获得 celery inspect active
Error: No nodes replied within time constraint
输出吗?
诊断此问题会采取哪种其他策略?
芹菜4.x. Redis后端。
答案 0 :(得分:2)
这被证明是Celery + gevent(邪恶的猴子补丁)+ Sentry的Raven记录器的死锁问题。
https://github.com/getsentry/raven-python/issues/305
诊断问题
您可以使用不同的队列(-q,-n)参数启动Celery worker,并查看worker挂起的时间。即使某些工作组被挂起,其他人仍可能会回复inspect
个查询。
Celery文件日志可能会显示错误
2017-02-27 08:36:34,371 CRITI [celery.worker][DummyThread-6] Unrecoverable error: AttributeError("'NoneType' object has no attribute 'readline'",)
Traceback (most recent call last):
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 594, in start
c.loop(*c.loop_args())
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/loops.py", line 118, in synloop
connection.drain_events(timeout=2.0)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/virtual/base.py", line 961, in drain_events
get(self._deliver, timeout=timeout)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 359, in get
ret = self.handle_event(fileno, event)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 341, in handle_event
return self.on_readable(fileno), self
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
**options)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/client.py", line 585, in parse_response
response = connection.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
AttributeError: 'NoneType' object has no attribute 'readline'