为什么我无法抓住Celery的WorkerLostError?

时间:2017-07-16 20:06:34

标签: python celery

我正在使用Celery使用fabric设置远程服务器。

因此,当出现问题时,我想将Server.status更改为启动(以防止双重启动)和Server.status为Error。请看我的代码:

class ChangeBackStatusOnErrorTask(celery.Task):
  abstract = True

  def on_failure(self, exc, task_id, args, kwargs, einfo):
    print 'from on_failue', self, exc, task_id, args, kwargs, einfo
    return
    #server = Server.query.get(server_id)
    #server.status = RemoteStatus.ERROR
    #db.session.commit()


@celery.task(bind=True, base=ChangeBackStatusOnErrorTask)
def deploy_server(self, server_id):
  """To prevent launching while we are launching, we will
  disable launching until the server's status is LAUNCHED
  """
  server = Server.query.get(server_id)
  if not server.can_launch():
    return

  try:
    server.status = RemoteStatus.LAUNCHING
    db.session.commit()

    host = server.ssh_user + '@' + server.ip
    execute(fabric_deploy_server, self, server, hosts=host)

    server.status = RemoteStatus.LAUNCHED
    db.session.commit()
  except Exception as e:
    server.status = RemoteStatus.ERROR
    db.session.commit()
    traceback.print_exc()
    raise e

但是,当我向celery任务提供错误的IP地址时,我能够遇到绕过所有故障处理机制的异常:

[2017-07-17 03:58:07,077: WARNING/PoolWorker-7] [root@1.2.3.45] Executing task 'fabric_deploy_server'
[2017-07-17 03:58:07,078: WARNING/PoolWorker-7] [root@1.2.3.45] sudo: apt-get update
[2017-07-17 03:58:17,173: WARNING/PoolWorker-7] Fatal error: Timed out trying to connect to 1.2.3.45 (tried 1 time)

Underlying exception:
    timed out
[2017-07-17 03:58:17,173: WARNING/PoolWorker-7] Aborting.
[2017-07-17 03:58:22,172: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0.',)
Traceback (most recent call last):
  File "/Users/vng/.virtualenvs/AutomataHeroku/lib/python2.7/site-packages/billiard/pool.py", line 1224, in mark_as_worker_lost
    human_status(exitcode)),
WorkerLostError: Worker exited prematurely: exitcode 0.

如您所见,

  1. ChangeBackStatusOnErrorTask.on_failure未被调用。
  2. 此异常逃脱了我的Try / Catch集团。
  3. 如何捕获此错误?我需要将Server.status设置为ERROR,以便我可以重新启动我的任务。

1 个答案:

答案 0 :(得分:0)

从文档中你应该从celery.result.EagerResult获得一个celery.execute对象,所以即使它失败了,我也不认为它会引发错误,除非你告诉它。我不认为你需要通过自我来执行。

同样执行似乎来自像celery2,你应该考虑更新。

试一试。

  try:
    server.status = RemoteStatus.LAUNCHING
    db.session.commit()

    host = server.ssh_user + '@' + server.ip
    # Apply will run locally.
    # http://docs.celeryproject.org/en/2.1-archived/reference/celery.execute.html#executing-tasks-celery-execute
    # Throw=True will re-raise the error if you get one.
    result = execute.apply(fabric_deploy_server, server, hosts=host, throw=True)

    server.status = RemoteStatus.LAUNCHED
    db.session.commit()
  except Exception as e:
    server.status = RemoteStatus.ERROR
    db.session.commit()
    traceback.print_exc()
    raise e

芹菜执行http://docs.celeryproject.org/en/2.1-archived/reference/celery.execute.html#executing-tasks-celery-execute

芹菜渴望结果http://docs.celeryproject.org/en/2.1-archived/reference/celery.result.html#celery.result.EagerResult