Celery worker正在运行,但是突然节点不再响应

时间:2019-08-23 11:52:17

标签: redis celery

我有一位从事Redis后端操作的芹菜工人工作了半年多,到目前为止,我还没有遇到任何问题。

突然,我没有收到节点的任何回复。

我可以成功启动celery,执行命令时没有错误消息:

celery multi start myqueue -A myapp.celery -Ofair
celery multi v4.3.0 (rhubarb)
> Starting nodes...
> myqueue@myhost: OK

但是,当我检查芹菜工作者的身份时

celery -A myapp.celery status

我收到消息:

Error: No nodes replied within time constraint.

如果我查找进程,则表明celery worker正在运行:

/usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4
\_ /usr/bin/python3 -m celery worker -Ofair -A myapp.celery --concurrency=4

当我做一个

celery -A myapp.celery control shutdown

上述过程已按预期删除。

从前台开始也不会给出任何提示:

$ celery -A myapp.celery myworker -l debug
Please specify a different user using the --uid option.

User information: uid=1000120000 euid=1000120000 gid=0 egid=0


uid=uid, euid=euid, gid=gid, egid=egid,
[2019-08-23 11:36:36,790: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2019-08-23 11:36:36,792: DEBUG/MainProcess] | Worker: Building graph...
[2019-08-23 11:36:36,793: DEBUG/MainProcess] | Worker: New boot order: {StateDB, Beat, Timer, Hub, Pool, Autoscaler, Consumer}
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2019-08-23 11:36:36,808: DEBUG/MainProcess] | Consumer: Building graph...
[2019-08-23 11:36:36,862: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Mingle, Tasks, Control, Heart, Gossip, Agent, event loop}

 -------------- celery@myapp-163-m4hs9 v4.3.0 (rhubarb)
---- **** ----- 
--- * ***  * -- Linux-3.10.0-862.3.2.el7.x86_64-x86_64-with-Ubuntu-16.04-xenial 2019-08-23 11:36:36
-- * - **** --- 
- ** ---------- [config]
- ** ---------- .> app:         myapp:0x7f2094fcd978
- ** ---------- .> transport:   redis://:**@${redis-host}:6379/0
- ** ---------- .> results:     redis://:**@${redis-host}:6379/0
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> myqueue      exchange=myqueue(direct) key=myqueue


[tasks]
  . sometask1
  . sometask2
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Hub
[2019-08-23 11:36:36,874: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:36,874: DEBUG/MainProcess] | Worker: Starting Pool
[2019-08-23 11:36:37,278: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,279: DEBUG/MainProcess] | Worker: Starting Consumer
[2019-08-23 11:36:37,280: DEBUG/MainProcess] | Consumer: Starting Connection
[2019-08-23 11:36:37,299: INFO/MainProcess] Connected to redis://:**@${redis-host}:6379/0
[2019-08-23 11:36:37,299: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,299: DEBUG/MainProcess] | Consumer: Starting Events
[2019-08-23 11:36:37,311: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:37,312: DEBUG/MainProcess] | Consumer: Starting Mingle
[2019-08-23 11:36:37,312: INFO/MainProcess] mingle: searching for neighbors
[2019-08-23 11:36:38,343: INFO/MainProcess] mingle: all alone
[2019-08-23 11:36:38,343: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,343: DEBUG/MainProcess] | Consumer: Starting Tasks
[2019-08-23 11:36:38,350: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,350: DEBUG/MainProcess] | Consumer: Starting Control
[2019-08-23 11:36:38,359: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,359: DEBUG/MainProcess] | Consumer: Starting Heart
[2019-08-23 11:36:38,363: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,363: DEBUG/MainProcess] | Consumer: Starting Gossip
[2019-08-23 11:36:38,371: DEBUG/MainProcess] ^-- substep ok
[2019-08-23 11:36:38,371: DEBUG/MainProcess] | Consumer: Starting event loop
[2019-08-23 11:36:38,372: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2019-08-23 11:36:38,373: INFO/MainProcess] celery@myapp-163-m4hs9 ready.
[2019-08-23 11:36:38,373: DEBUG/MainProcess] basic.qos: prefetch_count->16
[2019-08-23 11:36:38,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2019-08-23 11:36:38,839: INFO/MainProcess] Events of group {task} enabled by remote.
[2019-08-23 11:36:43,838: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]

Redis正在运行:

redis-cli -h ${redis-host}
redis:6379> ping
PONG

该日志文件不包含任何提示。

如前所述,当我检查芹菜工人的身份

celery -A myapp.celery status

我收到消息:

Error: No nodes replied within time constraint.

相反,芹菜应该以

回应
> myqueue@myhost: OK

或至少给出一些错误消息。

临时解决方案和进一步调查:

现在,立即的措施是将消息队列切换到RabbitMQ,并且工作程序处于联机状态并再次响应。因此,此问题似乎特定于使用Redis作为消息队列。 将Celery / Redis客户端更新到最新版本(Celery 4.3.0,redis 3.3.8)没有帮助。 Python版本是3.5(在OpenShift上)。

1 个答案:

答案 0 :(得分:0)

Kombu库的最新版本(4.6.4)中存在一个错误(一个Celery依赖项),导致this Github issue中记录的Redis问题。

该错误是Kombu存储库中的recently fixed in a pull request,但尚未发布。

将Kombu降级到4.6.3版将解决此问题。