芹菜:所有工人都陷入困境,如何诊断

时间:2017-02-13 19:48:39

标签: celery

我的所有芹菜工人都会定期被困在某物上。我无法弄清楚导致这种情况的原因,因为Performing Consistency Checks ----------------------------- Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok Checking for roles starting with 'pg_' ok Creating dump of global objects ok Creating dump of database schemas ok Checking for presence of required libraries ok Checking database user is the install user ok Checking for prepared transactions ok If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing. Performing Upgrade ------------------ Analyzing all rows in the new cluster ok Freezing all rows on the new cluster ok Deleting files from new pg_clog ok Copying old pg_clog to new server ok Setting next transaction ID and epoch for new cluster ok Deleting files from new pg_multixact/offsets ok Copying old pg_multixact/offsets to new server ok Deleting files from new pg_multixact/members ok Copying old pg_multixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives ok Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster ok Copying user relation files ok Setting next OID for new cluster ok Sync data directory to disk ok Creating script to analyze new cluster ok Creating script to delete old cluster ok Upgrade Complete 不起作用,因为所有工人都很忙。

inspect

是否有可能像活动任务一样获得Celery状态,即使节点正在做某事(这似乎导致了问题)?我可以以某种方式启动一个临时工来获得 celery inspect active Error: No nodes replied within time constraint 输出吗?

诊断此问题会采取哪种其他策略?

芹菜4.x. Redis后端。

1 个答案:

答案 0 :(得分:2)

这被证明是Celery + gevent(邪恶的猴子补丁)+ Sentry的Raven记录器的死锁问题。

https://github.com/getsentry/raven-python/issues/305

诊断问题

您可以使用不同的队列(-q,-n)参数启动Celery worker,并查看worker挂起的时间。即使某些工作组被挂起,其他人仍可能会回复inspect个查询。

Celery文件日志可能会显示错误

2017-02-27 08:36:34,371 CRITI [celery.worker][DummyThread-6] Unrecoverable error: AttributeError("'NoneType' object has no attribute 'readline'",)
Traceback (most recent call last):
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start
    return self.obj.start()
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 594, in start
    c.loop(*c.loop_args())
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/loops.py", line 118, in synloop
    connection.drain_events(timeout=2.0)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/connection.py", line 301, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/virtual/base.py", line 961, in drain_events
    get(self._deliver, timeout=timeout)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 359, in get
    ret = self.handle_event(fileno, event)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 341, in handle_event
    return self.on_readable(fileno), self
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
    chan.handlers[type]()
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
    **options)
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/client.py", line 585, in parse_response
    response = connection.read_response()
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
    response = self._parser.read_response()
  File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
    response = self._buffer.readline()
AttributeError: 'NoneType' object has no attribute 'readline'